5,827 Matching Annotations
  1. Jul 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new potential tool to manipulate Tregs function for therapeutic use. It focuses on the role of PGAM in Tregs differentiation and function. The authors, interrogating publicly available transcriptomic and proteomic data of human regulatory T cells and CD4 T cells, state that Tregs express higher levels of PGAM (at both message and protein levels) compared to CD4 T cells. They then inhibit PGAM by using a known inhibitor ECGC and show that this inhibition affects Tregs differentiation. This result was also observed when they used antisense oligonucleotides (ASOs) to knockdown PGAM1.

      PGAM1 catalyzes the conversion of 3PG to 2PG in the glycolysis cascade. However, the authors focused their attention on the additional role of 3PG: acting as starting material for the de novo synthesis of serine.

      They hypothesized that PGAM1 regulates Tregs differentiation by regulating the levels of 3PG that are available for de novo synthesis of serine, which has a negative impact on Tregs differentiation. Indeed, they tested whether the effect on Tregs differentiation observed by reducing PGAM1 levels was reverted by inhibiting the enzyme that catalyzes the synthesis of serine from 3PG.

      The authors continued by testing whether both synthesized and exogenous serine affect Tregs differentiation and continued with in vivo experiments to examine the effects of dietary serine restriction on Tregs function.

      In order to understand the mechanism by which serine impacts Tregs function, the authors assessed whether this depends on the contribution of serine to one-carbon metabolism and to DNA methylation.

      The authors therefore propose that extracellular serine and serine whose synthesis is regulated by PGAM1 induce methylation of genes Tregs associated, downregulating their expression and overall impacting Tregs differentiation and suppressive functions.

      Strengths:

      The strength of this paper is the number of approaches taken by the authors to verify their hypothesis. Indeed, by using both pharmacological and genetic tools in in vitro and in vivo systems they identified a potential new metabolic regulation of Tregs differentiation and function.

      We are grateful to the reviewer for their thoughtful and constructive consideration of our work. We appreciate their comment that the number of approaches taken to test our hypothesis represents a strength that increases confidence in the conclusions.

      Weaknesses:

      Using publicly available transcriptomic and proteomic data of human T cells, the authors claim that both ex vivo and in vitro polarized Tregs express higher levels of PGAM1 protein compared to CD4 T cells (naïve or cultured under Th0 polarizing conditions). The experiments shown in this paper have all been carried out in murine Tregs. Publicly available resources for murine data (ImmGen -RNAseq and ImmPRes - Proteomics) however show that Tregs do not express higher PGAM1 (mRNA and protein) compared to CD4 T cells. It would be good to verify this in the system/condition used in the paper.

      This is a fair comment. Although our pharmacologic and genetic studies demonstrated the importance of PGAM in Treg differentiation and suppressive function in murine cells, thereby corroborating the hypothesis formed based on human CD4 cell expression data, we agree that investigating PGAM expression in murine Tregs is important in the context of our work. In reviewing the ImmPres proteomics database, the reviewer is correct that PGAM1 expression was not higher in iTregs compared to other subsets, including Th17 cells. However, when compared to other glycolytic enzymes, expression of PGAM1 increases out of proportion in iTregs. In particular, the ratio of PGAM1 to GAPDH expression is much greater in iTregs compared to Th17 cells. This data is now shown in the revised Figure S5. The disproportionate increase in PGAM1 expression is consistent with the regulatory role of PGAM in the Treg-Th17 axis via modulation of 3PG concentrations, a metabolite that lies between GAPDH and PGAM in the glycolytic pathway. The divergent expression changes between GAPDH and PGAM furthermore support the conclusion that GAPDH and PGAM play opposite roles in Treg differentiation.

      It would also be good to assess the levels of both PGAM1 mRNA and protein in Tregs PGAM1 knockdown compared to scramble using different methods e.g. qPCR and western blot. However, due to the high levels of cell death and differentiation variability, that would require cells to be sorted.

      We appreciate this comment. As noted by the reviewer, assessing PGAM1 expression via qPCR and Western blot would require cell sorting, which we do not currently have the resources to pursue. However, we measured the effect of ASOs on PGAM1 protein expression using anti-PGAM1 antibody via flow cytometry, which allowed gating on viable cells. As shown in Figure S3A, PGAM-targeted ASOs led to an approximately 40% decrease in PGAM1 expression, as measured by mean fluorescence intensity (MFI). Furthermore, we now show in revised Figure S2 that ASO uptake was near-complete in our cultured CD4 cells.

      It is not specified anywhere in the paper whether cells were sorted for bulk experiments. Based on the variability of cell differentiation, it would be good if this was mentioned in the paper as it could help to interpret the data with a different perspective.

      Cells were not sorted for bulk experiments. In the revised manuscript, this point is made clear in the text, figure legends, and Methods. It is worth noting that all bulk experiments were conducted on samples with greater than 70% cell viability (greater than 90% for stable isotope tracing studies).

      Reviewer #2 (Public review):

      Summary:

      The authors have tried to determine the regulatory role of Phosphoglycerate mutate (PGAM), an enzyme involved in converting 3-phosphoglycerate to 2-phosphoglycerate in glycolysis, in differentiation and suppressive function of regulatory CD4 T cells through de novo serine synthesis. This is done by contributing one carbon metabolism and eventually epigenetic regulation of Treg differentiation.

      Strengths:

      The authors have rigorously used inhibitors and antisense RNA to verify the contribution of these pathways in Treg differentiation in-vitro. This has also been verified in an in-vivo murine model of autoimmune colitis. This has further clinical implications in autoimmune disorders and cancer.

      We very much appreciate these comments about the rigor of the work and its implications.

      Weaknesses:

      The authors have used inhibitors to study pathways involved in Treg differentiation. However, they have not studied the context of overexpression of PGAM, which was the actual reason to pursue this study.

      We appreciate this comment and agree that overexpression of PGAM would be an excellent way to complement and further corroborate our findings. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest increasing the font size for flow cytometry gates. Percentages are the focus of the analysis, and it is very hard to read any.

      We have increased the font size on all flow cytometry gates, as suggested.

      Moreover, most of the flow data show Tregs polarization based on CD25 and FOXP3 expression. However, Figure 3 A, Figure 4D and Figure S3 show Tregs polarization based on FSC and Foxp3. Is there any reason for this?

      Antibody staining against CD25 was poor in the experiments noted, which is why Foxp3 alone was used to identify Treg cells in these experiments.

      Especially for Figure 3A, other cells could also express Foxp3 making interpretation difficult.

      This is a fair comment. With respect to Figures 4D and S3 (now revised Figure S4), these experiments were conducted in isolated CD4 cells, in which the population of CD25-Foxp3+ cells is minimal following Treg polarization (as evident in our other figures). Regarding Figure 3A, previous work has found minimal expression of Foxp3 in circulating non-T cells (Devaud et al., 2014, PMID 25063364), such that we have confidence the identified Foxp3 expressing cells are, in fact, Treg cells. Notably, Figure 3A was already gated on CD4+ T cells, and in the periphery of wild-type mice, these would be reasonably referred to as Tregs, although this does not apply to diseased states or specific cases such as the tumor microenvironment.

      The level of murine Tregs differentiation varies a lot among experiments. The % of CD4+CD25+FOXP3+ is ranging from 14% to 77% (controls). It would be good to understand and verify why such differentiation variability.

      For most of our Treg polarization experiments, % differentiation in the control group falls within the 35 – 55% range. We found that treatment with ASOs (even scrambled control ASOs) tended to decrease Treg polarization overall, leading to lower numbers of Foxp3 expression in these experiments. Differentiation was similarly low in a few experiments that did not involve the use of ASOs, which we believe was caused by batch variability in the recombinant TGF-b that was used for polarization. Despite this variability, experiments were conducted with sufficient independent experiments and biological replicates to observe consistent trends and to have confidence in the results, as corroborated by statistical testing and the wide variety of experimental approaches used to verify our conclusions. Notably controls were run in every experiment, allowing accurate comparisons to be made in each individual experiment.

      Similar comments apply to the level of cell death observed in the cultures of polarizing Tregs.

      Although there was some variability in cell viability between experiments, flow cytometry experiments were always gated on live cells, and we believe concerns about reproducibility are substantially mitigated by the number of independent experiments, biological replicates, and distinct experimental approaches used for verification of the experimental findings. For all bulk experiments, cell viability was greater than 70% and equal across samples. For the flux studies, viability was greater than 90% and equal across samples.

      Figure 2 B and D: EGCG has been used at two different concentrations. Is it lower in Figure 2D because of one condition being a combination of inhibitors or is it a typo?

      The doses stated in the original legend are correct. Yes, drug doses were optimized for combination-treatment experiments. This point is now clarified in the figure legend.

      Figure 2G: The description in the results does not match figure legend - Text - serine/glycine-free media or control (serine/glycine-containing) media; figure legend - serine/glycine-free media or media containing 4 mM serine.

      We thank the reviewer for pointing out this discrepancy, which was an error in the text. The two conditions used were 1) serine/glycine-free media, and 2) serine/glycine-free media supplemented with 4 mM serine. The text and figure legend have both been updated to clarify this point.

      Figure 3 F and G: the graphs do not show the individual points.

      Individual points were not shown in these graphs because they are derived from scRNA-seq data, with SCFEA calculated from individual cells. As such, there are far too many data points to display all individual values.

      CD4+ T-cell isolation and culture: cells were cultured in 50%RPMI and 50% AIM-V.

      I thought that AIM-V medium was intended to be for human cultures. Could some of the conditions explain the low level of differentiation observed in some experiments? If there is such variability it might be because the conditions used are not optimal and therefore not reproducible.

      We appreciate this critique. Although AIM-V media is often used for ex vivo human T cell cultures, it can similarly be used for mouse T cell culture with the addition of b-mercaptoethanol, as suggested by ThermoFisher and as used in prior publications, such as PMID 36947105. As outlined in the responses above, the differentiation we observed was consistent in most experiments, with some variability based on experimental conditions (such as lower differentiation in the setting of ASO treatment). Furthermore, we believe the number of independent experiments, biological replicates, and independent experimental approaches used in the study supports the reproducibility of our findings.

      Figures S1 A, S2 B, and S4: the flow data are shown using both heights (FSC) and area (zombie NIR dye). It would be better to use areas for both parameters.

      In the revised manuscript, areas are now used on both the x- and y-axes for these figures.

      Figure S1 B and S2 C: The bar graphs are both showing proliferation index, however, the graphs are labelled differently in the two figures and in the legend (proliferation index -Fig S1 B; division index -Fig S2 C and replication index in the legend of Fig S2 C). The explanation of how the index has been calculated should probably go in the legend of the first figure that shows it.

      We thank the reviewer for this comment. In the revised manuscript, we have ensured consistency in the terminology (“proliferation index” is now used consistently), and the explanation of the proliferation index calculation is now included in the legend to Figure S1, where the proliferation index first appears.

      Were Tregs PGAM1 KD used for RNAseq sorted or not? Based on the plots shown in Figure S2 B there is ~ 50% death which needs to be taken into consideration for the analysis if not depleted.

      Similar question for all bulk experiments. It is not specified in the methods or figure legends.

      The cells used for RNAseq and other bulk experiments were not sorted. This point is now made clear in the text, figure legends, and Methods. However, cultures were only used for bulk analyses if the viability in those particular experiments was greater than 70%. Given the sensitivity of stable isotope tracing analyses, cultures were only analyzed for those studies if viability was greater than 90%. In these experiments, viability was similar across samples.

      It was mentioned in Figure 1 that the PGAM KD led to transcriptional changes that impacted MYC targets and mTORC1 signalling. It would be good to validate these findings maybe with more targeted experiments.

      We appreciate this suggestion and agree that validation and further investigation of these critical targets would be worthwhile. However, because of limitations to resources and the fact that these findings are not critical to the main conclusions of the study, we consider these experiments as future directions beyond the scope of the current work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few suggestions and recommendations to improve the research study.

      (1) The authors have used the word 'vehicle' in most of the figures, however, this word is not explained well in the figure legend. The authors may want to clarify to readers whether vehicle is a plasmid or a solvent for control purposes. For example, in Figure 1D, if vehicle is a plasmid, then another sample for vehicle +/-EGCG should be considered for the rigor in results.

      Thank you for identifying this point of confusion. For all drug treatment experiments, vehicle controls consisted of solvent alone without drug. For ASO experiments, the control condition consisted of scrambled ASO. This point is now made clear in the Methods (“Drug and ASO Treatments” section) as well as in the main text. Furthermore, the figure legends and axes have been edited such that “vehicle” is only used to refer to drug experiments (in which solvent vehicle alone was used as control), and “control” is used to refer to ASO experiments (in which scrambled ASO served as control).

      (2) Figure 1H represents the RNAseq data for knockdown of PGAM1. It might be interesting to see similar data for the overexpression of PGAM1.

      We appreciate this comment and agree that overexpression of PGAM1 would be an excellent way to complement and further corroborate our findings using PGAM1 knockdown and pharmacologic inhibition. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      (3) The font in most of the data from flow cytometry experiments (for example 1I) is not legible. Please increase the font size to make it legible.

      Font sizes have been increased.

      (4) Figure S2, PGAM expression was measured by Flow cytometry experiments. A similar experiment using western Blot, the direct measurement of protein expression, will strengthen the evidence.

      We appreciate this comment. As noted in the public reviews, Western blot would require sorting of viable cells, and unfortunately we do not currently have the resources to conduct additional experiments with FACS. However, we respectfully note that assessing protein expression via flow cytometry quantifies protein levels based on antibody binding, similar to Western blot (or in-cell Western blot), while also allowing gating on viable cells. We also note that nearly 100% of cultured CD4 cells took up ASO, as shown in revised Figure S2.

      (5) Figure 1J, it is mentioned in the text that 10 datasets were studied. a normalized parameter such as overexpression or suppression could be studied with the variance. It will be good to understand the variability in response among different datasets.

      We thank the reviewer for the opportunity to clarify this data. This data was taken from a single published dataset (Dykema et al., 2023, PMID 37713507) in which 10 distinct subsets of tumor-infiltrating Tregs (TIL-Tregs) were identified, rather than from 10 distinct datasets. After identifying the Activated (1)/OX40hiGITRhi cluster of TIL-Tregs as a highly suppressive subset that correlates with resistance to immune checkpoint blockade, Dykema et al. compared gene expression in this subset to the bulked collection of the other 9 subsets, and the data shown in Figure 1J is derived from this analysis. As such, the data in Figure 1J is, indeed, a normalized parameter of overexpression, showing overexpression of PGAM1 in this highly suppressive subset versus other subsets, out of proportion to proximal rate-limiting glycolytic enzymes. The main text and figure/figure legend have been edited to clarify this point.

      (6) It will be good to rephrase that the roles of PGAM and GAPDH are opposite, this paragraph is confusing since words such as "supporting Treg differentiation" and "augments Treg differentiation" have been used, although the data in S3 and 1D are opposite. Any possible explanation for the opposing roles of PGAM and GAPDH, despite their involvement in the same pathway of glycolysis, can be added to build up the interest of readers. What is the comparison of the expression of GAPDH and PGAM in Figure 1J?

      We thank the reviewer for this comment, as we appreciate that the language used in our initial manuscript was confusing. We have edited the main text, in both the Results and Discussion section, in order to clarify this point and provide explanation as suggested. Indeed, our experimental data indicate that GAPDH and PGAM play opposing roles in Treg differentiation; whereas inhibiting GAPDH activity leads to greater Treg differentiation (shown in revised Figure S4 and our previously published work), similarly inhibiting PGAM leads to diminished Treg differentiation. We view this point (that enzymes within the same glycolytic pathway can have divergent roles in T cells) as a primary implication of these findings, with the explanation that individual enzymes within the same pathway can differentially regulate the concentrations of key immunoactive metabolites. In our study, we identified 3PG as a key immunoactive metabolite whose concentration would be differentially impacted by GAPDH activity versus PGAM activity, since it lies downstream of GAPDH but upstream of PGAM.

      To provide further evidence for the opposing roles of GAPDH and PGAM, we analyzed existing datasets. In the revised Figure S5, we show that the PGAM1/GAPDH expression ratio increases in both human and mouse Tregs compared to other CD4 subsets.

      (7) Figure 2C, what is M+1, M+2 etc. Does it represent the number of hrs? If so, why are the results for 6 hrs are not shown since the study was for 6 hrs? And what is happening with M+2?

      We appreciate the opportunity to clarify this point and apologize for prior confusion. The terminology “M+n” refers to mass-shift produced by incorporation of 13-carbon. When a metabolite incorporates a single 13-carbon atom, it has a mass-shift of one (M+1), whereas incorporation of three 13-carbon atoms produces a mass-shift of three (M+3). Because we used uniformly 13-carbon labeled glucose, 3PG derived from the labeled glucose will have all three carbons labeled (M+3), as will serine that is newly synthesized from 3PG. Because serine can enter the downstream one-carbon cycle and be recycled, we also see the appearance of recycled serine with a single 13-carbon (M+1). The critical point in Figure 2C is that labeled serine is higher in Th17 versus Treg cells, demonstrating that de novo serine synthesis from glycolysis is greater. The main text has been edited to clarify this important point.

      (8) Including the quantification of inhibition and rescuing effect of EDCG and NCT will be helpful to readers.

      The inhibition and rescuing effects of these drugs are quantified in Figures 2D and 2E as they relate to Treg differentiation. The reviewer may be referring to quantification of relative effects on 3PG levels and serine synthesis. If so, we unfortunately do not have the resources to complete these studies, which would require large-scale quantitative mass spectrometry studies or enzyme activity assays.

      (9) Figure 2D and 2E: The authors could also experiment with a dose dependence curve on EGCG and NCT on this phenotype for Treg differentiation. That can help understand the balance between serine pathways and glycolysis pathways. Similarly, the dose dependence of 3PG for Figure 2E and comparing it to the kinetic constants of these enzymes involved and cellular concentrations, these details will be helpful to understand the metabolic dynamics, because this phenotype could be an interplay of both 3PG and serine concentrations.

      We appreciate this suggestion and agree that establishing detailed dose-dependence curves and relating these findings to enzyme kinetics would yield additional insights into the biochemical regulation provided by PGAM and PHGDH. Unfortunately we do not have the resources to pursue these additional studies, which therefore lie beyond the scope of our current work.

      (10) Figure 4: Explanation for no effect of methionine supplementation?

      Thank you for raising this point. We speculate that methionine supplementation had minimal effect because physiologic levels of serine were sufficient to provide basal substrates for the one-carbon cycle. On the other hand, eliminating methionine produced enough of a decrease in one-carbon metabolism to potentiate the effects of excess serine. This point is now briefly addressed in the text.

      (11) For direct connection between PGAM and methylation, methylation experiments could be worked out with NCT1 and SHIN1 (as in Figure 4H).

      We very much appreciate this suggestion, which we agree would provide a strong complementary approach. Unfortunately we do not have the resources to pursue these studies currently. However, we believe the increased methylation observed following PGAM knockdown (Figure 4G) as strong evidence that PGAM activity directly modulates methylation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have given the wrong impression about SiNET6 classification (it was labeled in Fig. 4a in a misleading manner). In the revised manuscript, we corrected the labeling in Fig. 4a and clarified that SiNET6 is not assigned to any subtype. We also further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.     

      (Additional specific recommendations for the authors are provided below)

      (2) Results:

      Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We do not find evidence for similar progenitors in the SiNET samples, but they also do not contain two co-existing lineages of cancer cells within the same tumor, so this is harder to define. We agree about the need for additional validation for this specific finding and have noted that in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Methods:

      a) Could the team clarify the discrepancy in subtype assignment between two samples from the same patient? i.e. are these samples from the same tumor? If so, what does the team think is the explanation for the difference in subtype assignment?

      As noted above in response to the public review of reviewer #1, SiNET6 was in fact not assigned to any subtype (due to insufficient NE cells) and hence there was no discrepancy. We apologize for the misleading labeling of SiNET6 in the previous version and have corrected this In the revised version of Figure 4.

      b) What is the rationale for scoring tumor-derived programs on samples with no tumor cells? For instance, SiNET3 does not contain NE cells, and SiNET9 has a very low fraction of NE cells. Please clarify how the scoring was performed on these samples, as the program assignments may be driven by other cell types in samples with little to no NE cells.

      Scoring for tumor-derived programs was done only for the NE cells. Accordingly, SiNET3 was not scored or assigned to any of the programs. SINET9 was included in this analysis - although it had a relatively small fraction of NE cells, the absolute number of profiled cells was particularly high in this sample and therefore the number of NE cells was 130, higher than our cutoff of 100 cells.

      c) Given the heterogeneity of cell types within each sample, would there be a way to provide a refined sense of confidence for certain cell type annotations? This would be helpful given the heterogeneity in marker gene expression and the absence of gold-standard markers for fibroblasts and endothelial cells in this cancer type. Additionally, there seems to be an unusually large proportion of NK and T cells - was there selection for this (given that these tumors are largely not immune infiltrated)?

      Author Response: Except for the Neuroendocrine cells, there are six TME cell types that we consistently find in multiple SiNET samples: macrophages, T cells, B/plasma cells, fibroblasts, endothelial and epithelial cells. Each of these cell types are identified as discrete clusters in analysis of the respective tumors (as shown in Fig. 1a,b and Fig. S1), and these are exactly the six most common non-malignant cell types that we and others found in single cell analysis across various other tumor types (e.g. see Gavish et al. 2023, ref. #15). The signatures used to annotate these cell types are shown in Table S2, and they primarily consist of classical markers that are traditionally used to define those cell types. We therefore believe that the annotation of these typical tumor-associated cell types is robust and does not include major uncertainties. In addition to these five common cell types, there are three cell types that we find only in 1-2 of the samples – epithelial cells, plasma cells and NK cells. Again, we believe that their annotation is robust, and these cell types are primarily not used for further analysis.

      There was no selection for any specific cell types in this study. Nevertheless, single cell (or single nuclei) analysis may lead to biases towards specific cell types, that we cannot evaluate directly from the data. NK cells were detected only in one tumor. T cells were detected in eight of the ten samples; but in four of those samples the frequency of T cells was lower than 5% and only in one sample the frequency was above 20%. Therefore, while we cannot exclude a technical bias towards high frequency of T/NK cells, we do not consider these frequencies as high enough to suggest this specific type of bias. In the revised manuscript, we clarify that the commonly observed cell types in SiNETs are the same as those commonly observed in other tumors and we acknowledge the possibility of a technical bias in cell type capture.  

      d) Evaluating the expression of one gene at a time may not effectively demonstrate subtype-specific patterns, particularly when comparing NE cells from one tumor to non-NE cells from another, which may not be an appropriate approach for identifying differentially expressed genes. DE analysis coupled with concordance analysis, for example, could strengthen the results.

      We apologize, but we do not fully understand this comment. We note that the initial normalization by non-NE cells was done in order to decrease batch effects when combining the data of the two platforms. We also note that the two subtypes were identified by two distinct approaches, as shown in Fig. 2c and in Fig. 2f.

      (2) Results:

      See the above public review.

      (3) Minor Comments:

      a) Results: Single cell and single nuclei RNA-seq profiling of SiNETs

      The results say ten primary tumor samples from eight patients. Later in the paragraph it says, "After initial quality controls, we retained 29,198 cells from the ten patients." Please clarify to either ten samples or eight patients.

      Indeed these are ten samples rather than ten patients. We corrected that in the revised version and thank the reviewer for noticing our error.

      b) Methods:

      - Please specify which computational tools were used to perform quality control, signature scoring, etc.

      The approaches for quality control, scoring etc. are described in the methods. We implemented these approaches with R code and did not use other computational tools.

      - Minor point but be consistent with naming convention (ie, siAdeno vs SiAdeno) throughout the paper. For example, under "Sample Normalization, Filtering and annotations" change "siAdeno" to "SiAdeno."

      Thank you for noting this, we corrected that.

      - Add processing and analysis of MiNEN sample to the methods section. It is not mentioned in the methods at all.

      As noted in the revised manuscript, the MiNEN sample was analyzed in the same way as the SiNET fresh samples.

      c) Supplementary Figures:

      Figure S1: Change (A-H) to (A-I) to account for all panels in the figure.

      Figure S4: Add (C) after "the siAdeno sample" in the legend.

      Thank you for noting this, we corrected that.

      (4) Font size is quite small in the main figures.

      We enlarged the font in selected figure panels.

      Reviewer #2 (Recommendations for the authors):

      (1) The small number of samples used in some analyses affects the robustness of the findings. Increasing the sample size or including more validation data could improve the statistical reliability and make the results more convincing. The authors should consider expanding the cohort size or integrating additional external datasets to increase statistical power.

      We agree with the reviewer that adding more samples would improve the reliability of the results. However, the external data that we found was not comparable enough to enable integration with our data, and we are unable to profile additional SiNET samples in our lab. We hope that future studies would support our results and extend them further.

      (2) The biological significance of differentially expressed genes needs more depth, limiting the insights into SiNET biology. The authors should perform a comprehensive pathway enrichment analysis and integrate findings with existing literature. Tools like Gene Set Enrichment Analysis (GSEA) or Overrepresentation Analysis (ORA) could provide a more holistic view of altered biological processes.

      We thank the reviewer for this suggestion. We did examine the functional enrichment of differentially expressed genes and did not find additional enrichments that we felt were important to highlight beyond what we described. We report the genes in supplementary tables, enabling other researchers to examine these lists further. 

      (3) The unexpected finding of higher proliferation in non-malignant cells requires further investigation and plausible biological explanation. The authors should perform additional analyses to explore potential mechanisms, such as investigating cell cycle regulators or performing in vitro validation experiments. The authors should consider single-cell trajectory analysis to explore these highly proliferative non-malignant cells' potential differentiation or activation states.

      We agree that our results are descriptive and that we do not fully explain the mechanism for the high level of non-malignant cell proliferation. We did attempt to perform follow up computational analysis. These analyses raised the hypothesis that high levels of MIF are causing the proliferation of immune cells. Additional analyses that we performed were not sufficient to conclusively identify a mechanism, and we felt that they were not informative enough to be included in the manuscript. Further in vitro (or in vivo) studies are beyond the scope of the current work.

      (3) More details are required on methods used for p-value adjustment, and criteria for statistical significance should be clearly defined. Additionally, integrating scRNA-seq and snRNA-seq data needs a more thorough explanation, including batch effect mitigation and more explicit cell clustering representation. The authors should clearly describe p-value adjustments (e.g., FDR) and batch correction methods (e.g., Harmony, FastMNN integration) and include additional figures showing corrected UMAP plots or heatmaps post-batch correction to enhance the confidence in results.

      We now clarify in the Methods our use of FDR for p-value adjustments. As for batch correction, we have avoided the use of integration methods as we believe that they tend to distort the data and decrease tumor-specific signals. Instead, we primarily analyzed one tumor at a time and never directly compared cell profiles across distinct tumors but only compared the differences between subpopulations; specifically, we normalized the expression of NE cells by subtracting the expression of reference non-NE cells from the same tumor as a method to decrease batch effects. We now clarify this point in the Methods section.

      (4) The lack of analysis of interactions between different cell types limits understanding of tumor microenvironment dynamics. The authors should employ cell-cell interaction analysis tools (e.g., CellPhoneDB, NicheNet) to explore potential communication networks within the tumor microenvironment. This could provide valuable insights into how different cell types influence tumor progression and maintenance.

      We thank the reviewer for this suggestion. We have tried to use such methods but found the results difficult to interpret since these approaches generated very long lists of potential cell-cell interactions that are largely not unique to the SiNET context and their relevance remains unclear without follow up experiments, which are beyond the scope of this work. We therefore focused only on ligand/receptors that came up robustly through specific analyses such as the differences between SiNET subtypes. In particular, MIF is highly expressed in the epithelial subtype, and remarkably, MIF upregulation is shared across multiple cell types. Thus, the cell-cell interactions that are suggested by the SiNET data as somewhat unique to this context are those involving MIF and its receptor (CD74 on immune cell types), while other interactions detected by the proposed methods primarily reflect the generic ligand/receptors expressed by corresponding TME cell types.   

      Reviewer #3 (Recommendations for the authors):

      (1) For a relatively small dataset, the mixing of single-cell versus single-nucleus RNA-seq should be discussed more. It would be nice to have 1-2 tumors that are analyzed by both methods to compare and increase our understanding of how these different approaches may affect the results. This could be accomplished by splitting a fresh tumor into two parts, processing it fresh for single-cell RNA-seq, and freezing the other part for single-nucleus RNA-seq.

      We agree with the reviewer that the different techniques may bias our results and we refer to this limitation in the Results and Discussion sections. However, it is important to note that we do not directly integrate the primary data across these modalities, but rather analyze each tumor separately and only combine the results across tumors. For example, we first compare the NE cells from each tumor to control non-NE cells from the same tumor and then only compare the sets of NE-specific genes across tumors. Moreover, the subtypes that we detect cannot be explained by these modalities, as the first subtype contains samples from both methods and these subtypes are further demonstrated in external bulk data. Similarly, the results regarding low proliferation of NE cells and high proliferation of B/plasma cells are observed across both modalities. We therefore argue that while the combination of methods is a limitation of this work it does not account for the main results.  

      (2) The authors state that they defined the siNET transcriptomic signature by comparing their siNET single-cell/nucleus data to other NETs profiled by bulk RNA-seq. Some of the genes in the signature, such as CHGA, are widely used as markers for NETs (and not specific for siNET). The authors should address this in more detail.

      To define the SiNET transcriptomic signature we first analyzed each tumor separately and compared the expression of Neuroendocrine (NE) cells to that of non-NE cells to detect NE-specific genes. Next, we compared the lists of NE-specific genes across the 8 SiNET patients and found a subset of 26 genes which were shared across most of the analyzed SiNET samples (Fig. 2a). Thus, the signature was defined only from analysis of SiNETs and not based on comparison to other types of NETs and hence it is expected that the signature could contain both SiNET-specific genes and more generic NET genes such as CHGA.

      Only after defining this signature, we went on to compare it between SiNETs and other types of NETs (pancreatic and rectal) based on external bulk RNA-seq data. In this comparison, we observed that the signature was clearly higher in SiNETs than in the other NETs (Fig. 2b). This result supports the accuracy of the signature and further suggests that it contains a fraction of SiNET-specific genes and not only generic NET genes such as CHGA. Thus, we would expect this signature to perform well also for distinguishing between SiNET and types of NETs, but it does contain a subset of genes that would be high in the other NETs. Finally, we note that even though CHGA is a generic NET marker, the bulk RNA-seq data would suggest that, at least at the mRNA level, this gene is still higher expressed in SiNETs than in other NETs. To avoid confusion regarding the definition and specificity of the SiNET transcriptomic signature we have extended the description of this section in the revised manuscript.

      (3) The authors only compare their data to bulk transcriptomic data on NETs. While in some instances this makes sense given the bulk dataset has >80 tumors, they should at least cite and do some comparison to other published single-cell RNA-seq datasets of NETs (e.g., PMID: 37756410, 34671197). The former study listed has 3 siNETs, 4 pNETs, and 1 gNET. Do the epithelial-like and neuronal-like signatures show up in this dataset too?

      We examined these studies but concluded that their data was inadequate to identify the two SiNET subtypes. The latter study was of pNETs, while the former study had 3 SiNET samples but only from 2 patients, and furthermore it was enriching for immune cells with only very low amounts of NE cells. Therefore, we now cite this work in the discussion but cannot use it to extend the results from our work.

      (4) How did the authors statistically handle patients with more than one tumor sample (true for n = 2)? These tumor samples would not be truly independent.

      In both cases where we had two distinct samples of the same patient, only one sample had sufficient NE cells to be included in NE-related analysis and therefore the other samples (SiNET3 and SiNET6) were excluded from all analysis of NE differential expression and subtypes. These samples were only included in the initial analysis (Fig. 1) and in TME-related analysis (Fig. 3-4) in which there was no statistical analysis of differences between patients and hence no problem with the inclusion of 2 samples for the same patient. We clarified this issue in the revised version.

      (5) The association between siNET subtype and B/plasma cell proliferation is very interesting, as is the hypothesis regarding MIF signaling. It would be illuminating for the authors to perform cell-cell interaction analyses with methods such as CellChat in this context rather than just relying on DE. Spatial mapping would be helpful too and while this may be outside the scope of this study, it should at least be expounded upon in the Discussion section.

      Indeed, spatial transcriptomic analysis would add interesting insight to our data and to SiNET biology. Unfortunately, this is not within the scope of the current project but we note this interesting possibility in the Discussion. Regarding additional methods for cell-cell interactions, we have performed such analysis but found it not informative as it highlighted a large number of interactions that are not unique SiNETs and are difficult to interpret, and therefore we do not include this in the revised version. 

      (6) The authors note that in the mixed lung tumor, the NE component was more proliferative than that observed with siNETs. How does the proliferation compare to pNETs, gNETs, in other published studies? How about assessing the clonality of the SCC and LNET malignant cells with various genomic or combined genomic/transcriptomic methods?

      The percentage of proliferating NE cells in the mixed lung tumor was higher than 60%. This is extremely high, approximately four-fold higher than the average that we found in a pan-cancer analysis and higher than the average of any of the >20 cancer types that we analyzed (Gavish et al. 2023, ref. #15). This remarkably high proliferation serves as a control for the low proliferation that we found in SiNET NE cells.

      (7) In the Discussion on page 13, the authors write "Second, proliferation of NE cells may be inhibited by prior treatments with somatostatin analogues." How many patients were treated in this manner? This information should be made more explicit in the manuscript.

      Details on pretreatment with somatostatin analogues are provided in Table S1. All patients were pre-pretreated with somatostatin analogues, with the possible exception of one patient (P8, SiNET10) for which we could not confidently obtain this information.

      (8) On page 5, "bone-fide" is misspelled.

      (9) On page 8, "exact identify" is misspelled.

      We thank the reviewer and have corrected the typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors provide a study among healthy individuals, general medical patients and patients receiving haematopoietic cell transplants (HCT) to study the gut microbiome through shotgun metagenomic sequencing of stool samples. The first two groups were sampled once, while the patients receiving HCT were sampled longitudinally. A range of metadata (including current and previous (up to 1 year before sampling) antibiotic use) was recorded for all sampled individuals. The authors then performed shotgun metagenomic sequencing (using the Illumina platform) and performed bioinformatic analyses on these data to determine the composition and diversity of the gut microbiota and the antibiotic resistance genes therein. The authors conclude, on the basis of these analyses, that some antibiotics had a large impact on gut microbiota diversity, and could select opportunistic pathogens and/or antibiotic resistance genes in the gut microbiota.

      Strengths:

      The major strength of this study is the considerable achievement of performing this observational study in a large cohort of individuals. Studies into the impact of antibiotic therapy on the gut microbiota are difficult to organise, perform and interpret, and this work follows state-of-the-art methodologies to achieve its goals. The authors have achieved their objectives and the conclusion they draw on the impact of different antibiotics and their impact on the gut microbiota and its antibiotic resistance genes (the 'resistome', in short), are supported by the data presented in this work.

      Weaknesses:

      The weaknesses are the lack of information on the different resistance genes that have been identified and which could have been supplied as Supplementary Data.

      We have now supplied a list of individual resistance genes as supplementary data.

      In addition, no attempt is made to assess whether the identified resistance genes are associated with mobile genetic elements and/or (opportunistic) pathogens in the gut. While this is challenging with short-read data, alternative approaches like long-read metagenomics, Hi-C and/or culture-based profiling of bacterial communities could have been employed to further strengthen this work.

      We agree this is a limitation, and we now refer to this in the discussion. Unfortunately we did not have funding to perform additional profiling of the samples that would have provided more information about the genetic context of the AMR genes identified.

      Unfortunately, the authors have not attempted to perform corrections for multiple testing because many antibiotic exposures were correlated.

      The reviewer is correct that we did not perform formal correction for multiple testing. This was because correlation between antimicrobial exposures meant we could not determine what correction would be appropriate and not overly conservative. We now describe this more clearly in the statistical analysis section.

      Impact:

      The work may impact policies on the use of antibiotics, as those drugs that have major impacts on the diversity of the gut microbiota and select for antibiotic resistance genes in the gut are better avoided. However, the primary rationale for antibiotic therapy will remain the clinical effectiveness of antimicrobial drugs, and the impact on the gut microbiota and resistome will be secondary to these considerations.

      We agree that the primary consideration guiding antimicrobial therapy will usually be clinical effectiveness. However antimicrobial stewardship to minimise microbiome disruption and AMR selection is an increasingly important consideration, particularly as choices can often be made between different antibiotics that are likely to be equally clinically effective.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript by Peto et al., the authors describe the impact of different antimicrobials on gut microbiota in a prospective observational study of 225 participants (healthy volunteers, inpatients and outpatients). Both cross-sectional data (all participants) and longitudinal data (a subset of 79 haematopoietic cell transplant patients) were used. Using metagenomic sequencing, they estimated the impact of antibiotic exposure on gut microbiota composition and resistance genes. In their models, the authors aim to correct for potential confounders (e.g. demographics, non-antimicrobial exposures and physiological abnormalities), and for differences in the recency and total duration of antibiotic exposure. I consider these comprehensive models an important strength of this observational study. Yet, the underlying assumptions of such models may have impacted the study findings (detailed below). Other strengths include the presence of both cross-sectional and longitudinal exposure data and the presence of both healthy volunteers and patients. Together, these observational findings expand on previous studies (both observational and RCTs) describing the impact of antimicrobials on gut microbiota.

      Weaknesses:

      (1) The main weaknesses result from the observational design. This hampers causal interpretation and corrects for potential confounding necessary. The authors have used comprehensive models to correct for potential confounders and for differences between participants in duration of antibiotic exposure and time between exposure and sample collection. I wonder if some of the choices made by the authors did affect these findings. For example, the authors did not include travel in the final model, but travel (most importantly, south Asia) may result in the acquisition of AMR genes [Worby et al., Lancet Microbe 2023; PMID 37716364). Moreover, non-antimicrobial drugs (such as proton pump inhibitors) were not included but these have a well-known impact on gut microbiota and might be linked with exposure to antimicrobial drugs. Residual confounding may underlie some of the unexplained discrepancies between the cross-sectional and longitudinal data (e.g. for vancomycin).

      We agree that the observational design means there is the potential for confounding, which, as the reviewer notes, we attempt to account for as far as possible in the multivariable models presented. We cannot exclude the possibility of residual confounding, and we highlight this as a limitation in the  discussion. We have expanded on this limitation, and mention it as a possible explanation for inconsistencies between longitudinal and cross sectional models. Conducting randomised trials to assess the impacts of multiple antimicrobials in sick, hospitalised patients would be exceptionally difficult, and so it is hard to avoid reliance on observational data in these settings.

      We did record participants’ foreign travel and diet, but these exposures were not included in our models as they were not independently associated with an impact on the microbiome and their inclusion did not materially affect other estimates. However, because most participants were recruited from a healthcare setting, few had recent foreign travel and so this study was not well powered to assess the effects of travel on AMR carriage. We have added this as a limitation.

      In addition, the authors found a disruption half-life of 6 days to be the best fit based on Shannon diversity. If I'm understanding correctly, this results in a near-zero modelled exposure of a 14-day-course after 70 days (purple line; Supplementary Figure 2). However, it has been described that microbiota composition and resistome (not Shannon diversity!) remain altered for longer periods of time after (certain) antibiotic exposures (e.g. Anthony et al., Cell Reports 2022; PMID 35417701). The authors did not assess whether extending the disruption half-life would alter their conclusions.

      The reviewer is correct that the best fit disruption half-life of 6 days means the model assumes near-zero exposure by 70 days. We appreciate that antimicrobials can cause longer-term disruption than is represented in our model, and we refer to this in the discussion (we had cited two papers supporting this, and we are grateful for the additional reference above, which we have added). We agree that it is useful to clarify that the longer term effects may be seen in individual components of the microbiome or AMR genes, but not in overall measures of diversity, so have added this to the discussion.

      (2) Another consequence of the observational design of this study is the relatively small number of participants available for some comparisons (e.g. oral clindamycin was only used by 6 participants). Care should be taken when drawing any conclusions from such small numbers.

      We agree. Although our participants received a large number of different antimicrobial exposures, these were dependent on routine clinical practice at our centre and we lack data on many potentially important exposures. We had mentioned this in relation to antimicrobials not used at our centre, and have now clarified in the discussion that this also limits reliability of estimates for antimicrobials that were rarely used in study participants.

      (3) The authors assessed log-transformed relative abundances of specific bacteria after subsampling to 3.5 million reads. While I agree that some kind of data transformation is probably preferable, these methods do not address the compositional data of microbiome data and using a pseudocount (10-6) is necessary for absent (i.e. undetected) taxa [Gloor et al., Front Microbiol 2017; PMID 29187837]. Given the centrality of these relative abundances to their conclusions, a sensitivity analysis using compositionally-aware methods (such as a centred log-ratio (clr) transformation) would have added robustness to their findings.

      We agree that using a pseudocount is necessary for undetected taxa, which we have done assuming undetected taxa had an abundance of 10<sup>-6</sup> (based on the lower limit of detection at the depth we sequenced). We refer to this as truncation in the methods section, but for clarity we have now also described this as a pseudocount.  Because our analysis focusses on major taxa that are almost ubiquitous in the human gut microbiome, a pseudocount was only used for 3 samples that had no detectable Enterobacteriaciae.

      We are aware that compositionally-aware methods are often used with microbiome data, and for some analyses these are necessary to avoid introducing spurious correlations. However the flaws in non-compositional analyses outlined in Gloor et al do not affect the analyses in this paper:

      (1) The problems related to differing sequence depths or inadequate normalisation do not apply to our dataset, as we took a random subset of 3.5 million reads from all samples (Gloor et al correctly point out that this method has the drawback of losing some information, but it avoids problems related to variable sequencing depth)

      (2) The remainder Gloor et al critiques multivariate analyses that assess correlations between multiple microbiome measurements made on the same sample, starting with a dissimilarity matrix. With compositional data these can lead to spurious correlations, as measurements on an individual sample are not independent of other measurements made on the same sample. In contrast, our analyses do not use a dissimilarity matrix, but evaluate the association of multiple non-microbiome covariates (e.g. antibiotic exposures, age) with single microbiome measures. We use a separate model for each of 11 specified microbiome components, and display these results side-by side. This does not lead to the same problem of spurious correlation as analyses of dissimilarity matrices. However, it does mean that estimates of effects on each taxa outcome have to be interpreted in the context of estimates on the other taxa. Specifically, in our models, the associations of antimicrobial exposure with different taxa/AMR genes are not necessarily independent of each other (e.g. if an antimicrobial eradicated only one taxon then it would be associated with an increase in others). This is not a spurious correlation, and makes intuitive sense when using relative abundance as outcome. However, we agree this should be made more explicit.

      For these reasons, at this stage we would prefer not to increase the complexity of the manuscript by adding a sensitivity analysis.

      (4) An overall description of gut microbiota composition and resistome of the included participants is missing. This makes it difficult to compare the current study population to other studies. In addition, for correct interpretation of the findings, it would have been helpful if the reasons for hospital visits of the general medical patients were provided.

      We have added a summary of microbiome and resistome composition in the results section and new supplementary table 2), and we also now include microbiome and resistome profiles of all samples in the supplementary data. We also provide some more detail about the types of general medical patients included. We are not able to provide a breakdown of the initial reason for admission as this was not collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Provide a supplementary table with information on the abundance of individual genes in the samples.

      This supplementary data is now included.

      (2) Engage with an expert in statistics to discuss how statistical analyses can be improved.

      A experienced biostatistician has been involved in this study since its conception, and was involved in planning the analysis and in the responses to these comments.

      (3) Typos and other minor corrections:

      Methods: it is my understanding that litre should be abbreviated with a lowercase l.

      Different journals have different house styles: we are happy to follow Editorial guidance.

      p. 9: abuindance should be corrected to abundance.

      Corrected

      p. 9: relative species should be relevant species?  

      Yes, corrected. Thank you.

      p. 9 - 10: can the apparent lack of effect of beta-lactams on beta-lactamase gene abundance be explained by the focus on a small number of beta-lactamase resistance genes that are found in Enterobacteriaceae and which are not particularly prevalent, while other classes of resistance genes (e.g. Bacteroidal beta-lactamases) were excluded?

      It is possible that including other beta-lactamases would have led to different results, but as a small number of beta-lactamases in Enterobacteriaceae are of major clinical importance we decided to focus on these (already justified in the Methods). A full list of AMR genes identified is now provided in the supplementary data.

      p. 10: beta-lactamse should be beta-lactamase

      Corrected

      Figure 3A: could the data shown for tetracycline resistance genes be skewed by tetQ, which is probably one of the most abundant resistance genes in the human gut and acts through ribosome protection?

      TetQ was included, but only accounted for 23% of reads assigned to tetracycline resistance genes so is unlikely to have skewed the overall result. We limited the analysis to a few major categories of AMR genes and, other than VanA, have avoided presenting results for single genes to limit the degree of multiple testing. We now include the resistome profile for each sample in the supplementary data so that readers can explore the data if desired.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the importance of obligate anaerobic gut microbiota for human health, it might be interesting to divide antibiotics into categories based on their anti-anaerobic activity and assess whether these antibiotics differ in their effects on gut microbiota.

      The large majority of antibiotics used in clinical practice have activity against aerobic bacteria and anaerobic bacteria, so it is not possible to easily categorise them this way. There are two main exceptions (metronidazole and aminoglycosides) but there was insufficient use of these drugs to clearly detect or rule out a difference between them, even when categorising antimicrobials by class, so we prefer not to frame the results in these terms. Also see our comments on this categorisation below.

      (2) For estimating the abundance of anaerobic bacteria, three major groups were assessed: Bacteroidetes, Actinobacteria and Clostridia. To me, this seems a bit aspecific. For example, the phylum Bacteroidetes contains some aerobic bacteria (e.g. Flavobacteriia). Would it be possible to provide a more accurate estimation of anaerobic bacteria?

      We think that an emphasis on a binary aerobic/anaerobic classification is less biologically meaningful that the more granular genetic classification we use, and its use largely reflects the previous reliance on culture-based methods for bacterial identification. Although some important opportunistic human pathogens are aerobic, it is not clear that the benefit or harm of most gut commensals relates to their oxygen tolerance, and all luminal bacteria exist in an anaerobic environment. As such we prefer not to perform an additional analysis using this category. We are also not sure that this could be done reliably, as many of the taxa are characterised poorly, or not at all.

      We appreciate that Bacteroidetes, Actinobacteria and Clostridia are diverse taxa that include many different species, so may seem non-specific, but these were chosen because:

      i) they are non-overlapping with Enterobacteriaceae and Enterococcus, the major opportunistic pathogens of clinical relevance, so could be used in parallel, and

      ii) they make up the large majority of the gut microbiome in most people and most species are of low pathogenicity, so it is plausible that their disruption might drive colonisation with more pathogenic organisms (or those carrying important AMR genes).

      We have more clearly stated this rationale.

      (3) A statement on the availability of data and code for analysis is missing. I would highly recommend public sharing of raw sequence data and R code for analysis. If possible, it would be very valuable if processed microbiome data and patient metadata could be shared.

      We agree, and these have been submitted as supplementary data. We have added the following statement “The data and code used to produce this manuscript are available in the supplementary material, including processed microbiome data, and pseudonymised patient metadata. The sequence data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB86785.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Cao et al. provides a compelling investigation into the role of mutational input in the rapid evolution of pesticide resistance, focusing on the two-spotted spider mite's response to the recent introduction of the acaricide cyetpyrafen. This well-documented introduction of the pesticide - and thus a clearly defined history of selection - offers a powerful framework for studying the temporal dynamics of rapid adaptation. The authors combine resistance phenotyping across multiple populations, extensive resequencing to track the frequency of resistance alleles, and genomic analyses of selection in both contemporary and historical samples. These approaches are further complemented by laboratory-based experimental evolution, which serves as a baseline for understanding the genetic architecture of resistance across mite populations in China. Their analyses identify two key resistance-associated genes, sdhB and sdhD, within which they detect 15 mutations in wild-collected samples. Protein modeling reveals that these mutations cluster around the pesticide's binding site, suggesting a direct functional role in resistance. The authors further examine signatures of selective sweeps and their distribution across populations to infer the mechanisms - such as de novo mutation or gene flow-driving the spread of resistance, a crucial consideration for predicting evolutionary responses to extreme selection pressure. Overall, this is a well-rounded, thoughtfully designed, and well-written manuscript. It shows significant novelty, as it is relatively rare to integrate broad-scale evolutionary inference from natural populations with experimentally informed bioassays, however, some aspects of the methods and discussion have an opportunity to be clarified and strengthened.

      Strengths:

      One of the most compelling aspects of this study is its integration of genomic time-series data in natural populations with controlled experimental evolution. By coupling genome sequencing of resistant field populations with laboratory selection experiments, the authors tease apart the individual effects of resistance alleles along with regions of the genome where selection is expected to occur, and compare that to the observed frequency in the wild populations over space and time. Their temporal data clearly demonstrates the pace at which evolution can occur in response to extreme selection. This type of approach is a powerful roadmap for the rest of the field of rapid adaptation.

      The study effectively links specific genetic changes to resistance phenotypes. The identification of sdhB and sdhD mutations as major drivers of cyetpyrafen resistance is well-supported by allele frequency shifts in both field and experimental populations. The scope of their sampling clearly facilitated the remarkable number of observed mutations within these target genes, and the authors provide a careful discussion of the likelihood of these mutations from de novo or standing variation. Furthermore, the discovered cross-resistance that these mutations confer to other mitochondrial complex II inhibitors highlights the potential for broader resistance management and evolution.

      Weaknesses:

      (1) Experimental Evolution:

      - Additional information about the lab experimental evolution would be useful in the main text. Specifically, the dose of cyetpyrafen used should be clarified, especially with respect to the LD50 values. How does it compare to recommended field doses? This is expected to influence the architecture of resistance evolution. What was the sample size? This will help readers contextualize how the experimental design could influence the role of standing variation.

      The experimental design involved sampling approximately 6,000 individuals from the wild population ZJSX1, which were subsequently divided into two parallel cohorts under controlled laboratory conditions. The selection group (LabR) was subjected to continuous selection pressure using cyetpyrafen, while the control group (LabS) was maintained under identical laboratory conditions without exposure to acyetpyrafen. A dynamic selection regime was implemented wherein the acaricide dosage was systematically adjusted every two generations to maintain a consistent selection intensity, achieving a mortality rate of 60% ± 10% in the LabR population. This adaptive dosage strategy ensured sustained evolutionary pressure while preventing population collapse. The LC<sub>50</sub> values were tested at F1, F32, F54, F60, F62, and F66 generations using standardized bioassay protocols to quantify resistance development trajectories and optimize dosage for subsequent selection cycles. We provided the additional information in subsection 4.1 of the materials and methods section.

      - The finding that lab-evolved strains show cross-resistance is interesting, but potentially complicates the story. It would help to know more about the other mitochondrial complex II inhibitors used across China and their impact on adaptive dynamics at these loci, particularly regarding pre-existing resistance alleles. For example, a comparison of usage data from 2013, 2017, and 2019 could help explain whether cyetpyrafen was the main driver of resistance or if previous pesticides played a role. What happened in 2020 that caused such rapid evolution 3 years after launch?

      Although the introduction of the other two SDHI acaricides complicates the story, we would like to provide a complete background on the usage of acaricides with this mode of action in China. Although cyflumetofen was released in 2013 before cyetpyrafen, and cyenopyrafen was released in 2019 after cyetpyrafen, their market share is minor (about 3.2%) compared to cyetpyrafen (about 96.8%, personal communication). Since cross-resistance is reported among SDHIs, we could not exclude the contribution of cyflumetofen to the initial accumulation of resistance alleles, but the effect should be minor, both because of their minimal market share and because of the independent evolution of resistance in the field as found in our study. Although the contribution of cyflumetofen and cyenopyrafen cannot be entirely excluded, the rapid evolution of resistance seems likely to be mainly explained by the intensive application of cyetpyrafen. To clarify this issue, we added relevant information in the first paragraph of the discussion section.

      (2) Evolutionary history of resistance alleles:

      - It would be beneficial to examine the population structure of the sampled populations, especially regarding the role of migration. Though resistance evolution appears to have had minimal impact on genome-wide diversity (as shown in Supplementary Figure 2), could admixture be influencing the results? An explicit multivariate regression framework could help to understand factors influencing diversity across populations, as right now much is left to the readers' visual acuity.

      The genetic structure of the populations was examined by Treemix analysis. We detected only one migration event from JXNC to SHPD (no resistance data available for these two populations), suggesting a limited role for migration to resistance evolution. The multiple regression analysis revealed that overall genetic diversity and Tajima’s D across the genome were not significantly associated with resistance levels, genetic structure or geographic coordinates (P > 0.05), which all support a limited role of migration in resistance development.

      - It is unclear why lab populations were included in the migration/treemix analysis. We might suggest redoing the analysis without including the laboratory populations to reveal biologically plausible patterns of resistance evolution.

      Thank you for the constructive suggestion. The Treemix analysis was redone by removing laboratory populations and is now reported.

      - Can the authors explore isolation by distance (IBD) in the frequency of resistance alleles?

      Thank you for the constructive suggestion. No significant isolation-by-distance pattern was detected for resistance allele frequencies across all surveyed years (2020: P=0.73; 2021: P=0.52; 2023: P=0.16; Mantel test). We added these results to the text.

      - Given the claim regarding the novelty of the number of pesticide resistance mutations, it is important to acknowledge the evolution of resistance to all pesticides (antibiotics, herbicides, etc.). ALS-inhibiting herbicides have driven remarkable repeatability across species based on numerous SNPs within the target gene.

      We appreciate this comment, which highlights the need to place our findings within the broader evolutionary context of pesticide resistance. We have investigated references relevant to the evolution of resistance to diverse pesticides. As far as we can tell, the 15 target mutations in eight amino acid residues are among the highest number of pesticide resistance mutations detected, especially within the context of animal studies. We have added relevant text to the second paragraph of the discussion.

      - Figure 5 A-B. Why not run a multivariate regression with status at each resistance mutation encoded as a separate predictor? It is interesting that focusing on the predominant mutation gives the strongest r2, but it is somewhat unintuitive and masks some interesting variation among populations.

      We conducted a multiple regression analysis to explore the influence of multiple mutations on resistance levels of field populations. However of 15 putative resistant mutations, only five were detected in more than three populations where bioassay data are available, i.e. I260T, I260V, D116G, R119C, R119L. The frequency of three of these mutations, I260T (P = 0.00128), I260V (P = 0.00423) and D116G (P = 0.00058), are significantly correlated with the resistance level of field populations. This has been added.

      (3) Haplotype Reconstruction (Line 271-):

      - We are a bit sceptical of the methods taken to reconstruct these haplotypes. It seems as though the authors did so with Sanger sequencing (this should be mentioned in the text), focusing only on homozygous SNPs. How many such SNPs were used to reconstruct haplotypes, along what length of sequence? For how many individuals were haplotypes reconstructed? Nonetheless, I appreciated that the authors looked into the extent to which the reconstructed haplotypes could be driven by recombination. Can the authors elaborate on the calculations in line 296? Is that the census population size estimate or effective?

      Because haplotypes could not be determined when more than two loci were heterozygous, we detected haplotypes from sequencing data with at most one heterozygous locus. In total 844 individuals and 696 individuals were used to detect haplotypes of sdhB and sdhD. We detected 11 haplotypes (with 8 SNPs) and 24 haplotypes (with 11 SNPs) along 216 bp of the sdhB and 155 bp of the sdhD genes, respectively. Please see the fifth paragraph of subsection 2.4. We used ρ = 4 × Ne × d (genetic distance) (Li and Stephens, 2003) to calculate the number of effective individuals for one recombination event.

      (4) Single Mutations and Their Effect (line 312-):

      - It's not entirely clear how the breeding scheme resulted in near-isogenic lines. Could the authors provide a clearer explanation of the process and its biological implications?

      To investigate the effect of single mutations or their combination on resistance levels, we isolated the females and males with the same homozygous/ hemizygous genotypes for creating homozygous lines. Females from these lines were not near-isogenic, but homozygous for the critical mutations. We revised the description in the methods section to clearly define these lines.

      - If they are indeed isogenic, it's interesting that individual resistance mutations have effects on resistance that vary considerably among lines. Could the authors run a multivariate analysis including all potential resistance SNPs to account for interactions between them? Given the variable effects of the D116G substitution (ranging from 4-25%), could polygenic or epistatic factors be influencing the evolution of resistance?

      We couldn’t conduct multivariate analysis because most lines have only one resistant SNP. The four lines homozygous for 116G were from the same population. The variable mortality may reflect other unknown mechanisms but these are beyond the scope of this study.

      - Why are there some populations that segregate for resistance mutations but have no survival to pesticides (i.e., the green points in Figure 5)? Some discussion of this heterogeneity seems required in the absence of validation of the effects of these particular mutations. Could it be dominance playing a role, or do the authors have some other explanation?

      We didn’t investigate the degree of dominance of each mutation. The mutation I260V shows incompletely dominant inheritance (Sun, et al. 2022). To investigate survival rate of different populations, the two-spotted spider mite T. urticae was exposed to 1000 mg/L of cyetpyrafen, higher than the recommended field dose of 100 mg/L. Such a high concentration may lead to death of an individual heterozygous for certain mutations, such as I260V.

      - The authors mention that all resistance mutations co-localized to the Q-site. Is this where the pesticide binds? This seems like an important point to follow their argument for these being resistance-related.

      Yes. We revised Fig. 3c to show the Q-site.

      (5) Statistical Considerations for Allele Frequency Changes (Figure 3):

      - It might be helpful to use a logistic regression model to assess the rate of allele frequency changes and determine the strength of selection acting on these alleles (e.g., Kreiner et al. 2022; Patel et al. 2024). This approach could refine the interpretation of selection dynamics over time.

      Thank you for this suggestion. A logistic regression model was used to track allele frequencies trajectories. The selection coefficient of each allele and their joint effects were estimated.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the evolution of pesticide resistance in the two-spotted spider mite following the introduction of an SDHI acaricide, cyatpyrafen, in China. The authors make use of cyatpyrafen-naive populations collected before that pesticide was first used, as well as more recent populations (both sensitive and resistant) to conduct comparative population genomics. They report 15 different mutations in the insecticide target site from resistant populations, many reported here for the first time, and look at the mutation and selection processes underlying the evolution of resistance, through GWAS, haplotype mapping, and testing for loss of diversity indicating selective sweeps. None of the target site mutations found in resistant populations was found in pre-exposure populations, suggesting that the mutations may have arisen de novo rather than being present as standing variation, unless initially present at very low frequencies; a de novo origin is also supported by evidence of selective sweeps in some resistant populations. Furthermore, there is no significant evidence of migration of resistant genotypes between the sampled field populations, indicating multiple origins of common mutations. Overall, this indicates a very high mutation rate and a wide range of mutational pathways to resistance for this target site in this pest species. The series of population genomic analyses carried out here, in addition to the evolutionary processes that appear to underlie resistance development in this case, could have implications for the study of resistance evolution more widely.

      Strengths:

      This paper combines phenotypic characterisation with extensive comparative population genomics, made possible by the availability of multiple population samples (each with hundreds of individuals) collected before as well as after the introduction of the pesticide cyatpyrafen, as well as lab-evolved lines. This results in findings of mutation and selection processes that can be related back to the pesticide resistance trait of concern. Large numbers of mites were tested phenotypically to show the levels of resistance present, and the authors also made near-isogenic lines to confirm the phenotypic effects of key mutations. The population genomic analyses consider a range of alternative hypotheses, including mutations arising by de novo mutation or selection from standing genetic variation, and mutations in different populations arising independently or arriving by migration. The claim that mutations most likley arose by multiple repeated de novo mutations is therefore supported by multiple lines of evidence: the direct evidence of none of the mutations being found in over 2000 individuals from naive populations, and the indirect evidence from population genomics showing evidence of selective sweeps but not of significant migration between the sampled populations.

      Weaknesses:

      As acknowledged within the discussion, whilst evidence supports a de novo origin of the resistance-associated mutations, this cannot be proven definitively as mutations may have been present at a very low frequency and therefore not found within the tested pesticide-naive population samples.

      We agree that we could not definitively exclude the presence of a very low incidence of favoured mutations before the introduction of this novel acaricide.

      Near-isofemale lines were made to confirm the resistance levels associated with five of the 15 mutations, but otherwise, the genotype-phenotype associations are correlative, as confirmation by functional genetics was beyond the scope of this study.

      We hope that future functional studies will validate the effects of these mutations on resistance in both the two-spotted spider mite T. urticae and other spider mite species. This could be done by creating near-isogenic female lines or using CRISPR-Cas9 technology, as gene knockouts have recently been established for T. urticae.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Could the authors elaborate on the environmental context (e.g., climate, geography) of the sampled populations to give more nuance to the analysis of genetic differentiation and resistance evolution?

      We have explored the influence of geographic isolation on the frequency of resistance alleles by Mantel tests (isolation by distance). We didn’t investigate the influence of climate, because most of the samples were from greenhouses, where the climate to which the pest is exposed is unclear.

      (2) Line 161: is this supposed to be one R and one S?

      Yes, we added this information (LabR and LabS).

      (3) Line 207: variation is not saturated at the first two sites because the different combinations are not seen. This is a bit misleading.

      What we wanted to indicate was that the two codon positions are saturated, rather than their combinations. We revised this sentence by adding “of each codon position”.

      (4) Line 376: continuous selection did not "result in a new mutation arising". Rather, the mutation arose and was subsequently selected on.

      We revised the expression of this de novo mutation and selection process.

      (5) Line 402: can the authors explore what Ne would be necessary to drive the number of mutational origins they observe, as in (Karasov et al. 2010)?

      It is challenged to estimate Ne, especially when mutation rate data from the two-spotted spider mite T. urticae is unavailable. We observed 2.7 resistant mutations per population in samples collected in 2024, seven years after the release of cyetpyrafen. The estimated mutation rate (Θ) is  0.0193, given 20 generations per year for T. urticae. An effective population size (Ne) of 2.29*10<sup>6</sup> would be necessary to reach the number of de novo mutations observed in this study, given Θ  =  3Neμ (haplodiploid sex determination of T. urticae) and a mutation rate of μ  =  2.8*10<sup>-9</sup> per base pair per generation as estimated for Drosophila melanogaster (Keightley et al., 2014). The high reproductive capacity of T. urticae (> 100 eggs per female) and short generation time makes it easier to reach such a population size in the field as we now note.

      (6) Line 482: how did the authors precisely kill 60% of samples with their selection? What was the applied rate? In general, listing the rates of insecticide used in dose response would be useful to decipher if LD50s are projected outside of the doses used (seems like they are). In this case, authors should limit their estimates to those > the highest rate used in the dose response.

      It is difficult to control mortality precisely. We applied cyetpyrafen every two generations but did not determine the LC<sub>50</sub> every two generations. When mortality was lower than 60%, another round of spraying was applied by increasing the dosage of the pesticide. The LC<sub>50</sub> values were tested at F<sub>1</sub>, F<sub>32</sub>, F<sub>54</sub>, F<sub>60</sub>, F<sub>62</sub>, and F<sub>66</sub> generations to establish the trajectories around resistance.

      (7) The light pink genomic region in Figure 2 was distracting. Why is it included if there is no discussion of genomic regions outside the sdh genes? Generally, there was a lot going on in this figure, and some guiding categories (i.e., lab selected vs wild population) on the figure itself could help orient the reader.

      We included chromosome 2 colored in light pink/ red to show the selection signal across a wider genomic region. In the figure legend, we added a description of the lab selected, field resistant and field susceptible populations. Very little common selection signal was detected among resistant populations on chromosome 2, indicating this region was less likely to be involved in resistance evolution of T. urticae to cyetpyrafen. We also described the result briefly in the figure legend.

      Reviewer #2 (Recommendations for the authors):

      (1) The most significant aspect of this study is the use of multiple pest population samples taken before as well as after the introduction of a class of pesticides, allowing a thorough comparative population genomics study in a species where a range of resistance mutations have appeared within a few years. I would prefer to see a title conveying this significance, rather than the current study, which focuses on the total number of mutations and claimed notoriety of the (at that point unnamed) study species. Similarly, I would prefer an abstract that relies less on superlative claims and includes more details: the scientific name of the study species; the number of years in which resistance evolved; the number of historical specimens; how the resistance levels for single mutations were shown.

      (1) The title was changed by adding “the two-spotted spider mite Tetranychus urticae” and removing the “unprecedented number” to emphasize that “recurrent mutations drive rapid evolution”, i.e., “Recurrent Mutations Drive the Rapid Evolution of Pesticide Resistance in the Two-spotted Spider Mite Tetranychus urticae.”

      (2) The scientific name of the study species was added.

      (3) The number of years in which resistance evolved was added.

      (4) The number of historical specimens was added (2666).

      (5) Because we used homozygous lines but not iso-genic lines or gene-edited lines, our bioassay data could not provide direct evidence on the level of resistance conferred by each mutation. We revised our description of the results and removed this content from the abstract.

      Line 29: if you want to claim the number is unprecedented, please specify the context: unprecedented for a pesticide target in an arthropod pest? (more resistance mutations may have been found in bacteria/fungi...).

      We revised the sentence by adding “in an arthropod pest”.

      Line 30: rather than a claim of notoriety, it may be better to specify what damage this pest causes.

      Revised by describing it as an arthropod pest.

      Line 34: please clarify, was this all in different haplotypes, or were some mutations found in combination?

      Done: We identified 15 target mutations, including six mutations on five amino acid residues of subunit sdhB, and nine mutations on three amino acid residues of subunit sdhD, with as many as five substitutions on one residue.

      (2) The introduction begins by framing the context as resistance evolution in invertebrate pests. However, the evolutionary processes examined in the study are applicable to resistance in other systems, and potentially to other cases of rapid contemporary evolution. The authors could show wider significance for their work beyond the subfield of invertebrate pests by including more of this wider context in their introduction and discussion: even if this means they can no longer claim novelty based on the number of mutations alone, the study is a strong example of the use of population genomics combined with functional and phenotypic characterisation to investigate the evolutionary processes underlying the emergence of resistance, so could have wider importance than within its current framing.

      The background was revised as mentioned above to take this into account.

      For example, in lines 48-50, please clarify what is meant by pesticides here (insects/arthropods? weeds and pathogens too?) In lines 69-73, the opposite is sometimes seen in fungal pathogens, with large numbers of mutations generated in lab-evolved strains.

      We extended pesticides to those targeting arthropods, weeds and pathogens. We still emphasize the situation mainly with respect to arthropod pests.

      (3) Lines 91-93: how many modes of action? How recently were SDHI acaricides introduced?

      Added: at least 11 groups of acaricides based on their modes of action. SDHI was launched in 2007.

      (4) Line 98-102: Use in China is a useful background for the study populations, but the global context should be included too.

      Yes, four SDHI acaricides developed around the globe were introduced.

      (5) Line 113: They show diverse mutations, but all within the mechanism of target-site point mutations.

      We agree to your suggestion. This sentence has been removed as it repeats information stated above it.

      (6) Line 115-116: Yes, agreed; I think this is the main strength of the current study and should be emphasised sooner.

      Thanks.

      (7) Line 158: Selective sweep signals were clear in half of the resistant populations but not in the others. The suggestion that the others had undergine soft sweeps, with multiple mutations increasing in frequency simultaneously but no one reaching fixation, seems reasonable; but the authors could compare the populations that did show a sweep with those that did not (for example, was there greater diversity or evenness of genotypes in those that did not?).

      Five resistant populations with selection signals identified by PBE analysis (Figure 2b) showed corresponding decreases in π and Tajima’s D near the two SDH genes but not across the genome (Figure S1).

      (8) Line 313: please clarify "in combination with other mutations" within a mixed population or combined in one individual/haplotype? Also, the phrase "characterised the function" may be a little misleading, as this is a correlative analysis, not functional confirmation.

      None of the combinations of different resistant mutations was observed in a single haplotype. Here, we examine resistance levels associated with a single mutation or two mutations on sdhB and sdhD in one individual, i.e. sdhB_I260V and sdhD_R119C. We revised the sentences to avoid any implication of functional confirmation.

      (9) Line 358: again, please clarify the context: among arthropod pests?

      Done.

      (10) Line 360-363: please give some background on when and where these related compounds were introduced.

      Added.

      (11) Line 410: yes fitness costs may be a factor, but you could also give an example of a cost expressed in the absence of any pesticides, as well as the given example of negative cross-resistance.

      We added the example of the H258Y mutation which causes both fitness costs and negative cross-resistance.

      (12) Lines 419-438: this is one aspect where the situation for insecticides is in contrast with some other resistance areas.

      Yes, we restricted these statements to arthropod pests.

      (13) Line 466: some more detail could be given here: for example, SNP-specific monitoring would be less effective, but amplicon sequencing would be more suitable.

      Yes, revised.

      (14) Lines 472-475: Please list the numbers of field/lab, pre/post exposure, and sensitive/resistant populations within the main text.

      Done. The number of sensitive/resistant populations was reported in the result section.

      (15) Line 483: randomly selected individuals?

      Yes, added randomly selected individuals.

      (16) Line 556: Sanger sequencing to characterise populations? Or a number of individuals from each population?

      Revised.

      (17) References: there are some duplicate entries, please check this.

      Checked.

      (18) Figure 1e: consider a log(10) scale to better show large fold changes and avoid multiple axis breaks.

      Thanks for your suggestions. However we didn’t scale the LC<sub>50</sub> value, because we wanted to show the specific impact of 1,000 mg/L. The breaks in the Y axis around 30 mg/L -1,000 mg/L reveal that the LC50s of the resistant populations were all greater than 1000 mg/L, while those of the susceptible populations were all below 30 mg/L. This justified the use 1000 mg/L as a discriminating dose to investigate resistance status and level in subsequent work.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for the GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Comments on latest version:

      The authors have attempted to address my initial concerns with additional experiments and refutations. Unfortunately, my concerns, especially my specific comments 1-3, remain unaddressed. The present manuscript is descriptive and fails to describe the molecular mechanism by which Sakura exerts its function in the germline. Nevertheless, this reviewer acknowledges that the observed defects in sakura mutant ovaries and the possible physiological significance of the Sakura-Out interaction are worth sharing with the research community, as they may lay the groundwork for future research in functional analysis.

      We thank the reviewer for valuable comments. We would like to investigate the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (named it sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. In this revised manuscript, the authors further investigated whether Sakura affects the function of Orb, a binding partner they identified, in deubiquitinase activity when Orb interacts with Bam.

      We appreciate the authors' efforts to address all our comments. While these revisions have greatly improved the clarity of certain sections, some of the concerns remain unclear, while details mentioned in the responses about these studies should be incorporated in the manuscript. Specifically, the manuscript still lacks the demonstration that Sakura co-localizes with Orb/Bam despite having the means for staining and visualization. This would bring insight into the selective binding of Orb with Bam vs. Sakura perhaps at different stages of oogenesis. Such analyses would allow for more specific conclusions, further alluding to the underlying mechanism, rather than the general observations currently presented.

      This elaborate study will be embraced by both germline-focused scientists and the developmental biology community.

      We thank the reviewer for valuable comments. We believe that the author meant Otu, not Orb, for the binding partner of Sakura that we identified. We would like to investigate the colocalization of Sakura with other proteins including Otu and the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field.

      Comments on latest version:

      With these revisions, the authors have addressed my main concerns.

      We thank the reviewer for valuable comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript is much improved based on the changes made upon recommendations from the reviewers.

      Though most of our comments have been addressed, we have a few more we wish to recommend. For previous points we made, we replied with further clarification for the authors.

      Figure 1

      (1) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      • Previous Fig1B (sakura mRNA expression level) is now Fig S2, not S1. Please make this data as Fig S1.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      (2) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      • The labels on lanes for Stages 12-13 and Stage 14, still only say "chambers", not "egg chambers". Also there is no Stage 1-3 egg chamber. More accurately, the label should be "Germarium - Stage 11 egg chambers".

      We updated the lables on lanes as suggested by the reviewer.

      (3) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakuranull phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      • Please put this info into the Methods section.

      We added this info into the Methods section.

      (4) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      • Please add this detail into the manuscript.

      We added this info into the Methods section.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer's point. We think using numbers, not %, makes more sense.

      • Having a different 'n' number for each experiment does not allow one to compare anything except numbers of the egg chambers. This must be normalized.

      We still don’t agree with the reviewer. In Fig 5D, we are showing the numbers of stage 14 oocytes per fly (= per a pair of ovaries). ‘n’ is the number of flies (= number of a pair of ovaries) examined. We now clarified this in the figure legend. Different ‘n’ number does not prevent us from comparing the numbers of stage 14 oocytes per fly. Therefore, we would like to show as it is now.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      • Please add this information to the manuscript.

      We added this info into the Methods section.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      • Current Fig S1 should go to Fig 7, to better understand the relationship between pMad and Bam expression.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      Figure 9C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      • Please add this info to the Methods section.

      We added this info into the Methods section.

      Figure 10- Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer's points. In our study, even for the full-length proteins. We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      • Based on your binding studies, we would expect them to colocalize in the egg chamber, and since there are antibodies and a GFP-line available, it would be important to demonstrate that via visualization.

      As we wrote in the response and now in the manuscript, our antibodies are not best for immunostaining. We will try to optimize the experimental conditions in the future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      There are four main areas that need further clarification:

      (1) Further and more complete assessment of senescence and the fibroblasts must be done to support the claims. 

      We sincerely appreciate the Reviewing Editor's valuable suggestion regarding the addition of cellular senescence detection markers. In the revised manuscript, we have incorporated additional detection markers for cellular senescence, such as H3K9me3 and SA-β-gal staining, in healthy and periodontitis gingival samples to further validate our findings (Figure 1A, B in revised manuscripts).

      (2) Confusion between ageing and senescence throughout the manuscript.

      We fully understand the concerns raised by the Reviewing Editor and reviewers regarding the confusion between the concepts of ageing and senescence in the manuscript. Cellular senescence is a manifestation of ageing at the cellular level. In the revised manuscript, we have given priority to the term ‘senescence’ to describe the cell condition instead of ‘aging’.

      (3) The lipid metabolism mechanistic claims are very speculative and largely unsupported by experimental data. 

      We greatly appreciate the Reviewing Editor and reviewers for pointing out the incorrect statements regarding the role of lipid metabolism in regulating cellular senescence. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (4) Concerns about the use of Metformin as a senotherapy vs other pleiotropic effects in periodontitis and the suggestion of using an alternative Senolytic drug (Bcl2 inhibitors, etc.). 

      We fully understand the concerns of the Reviewing Editor and reviewers regarding metformin as an anti-aging therapy. In the revised manuscript, we have included additional experiments using other senolytic drugs ABT-263, a Bcl2 inhibitor, in the ligature-induced periodontitis mouse model. The corresponding results could be found in the Figure 6. and Page 9-10, Line 248-264 in the revised manuscripts.

      Reviewer #1 (Recommendations For The Authors):

      While most of the experiments are elegantly designed and the procedures well conducted there are several critical weaknesses that temper my enthusiasm for this solid and timely work. Considering my main points, I would recommend the following:

      (1) Potentiate the senescent assessment in vitro and, most importantly, in vivo. E.g. SABgal with fresh tissue, other senescent biomarkers like SAHFs (HP1g or H3K9me3), etc.

      We sincerely appreciate the reviewers' suggestion to potentiate the assessment of cellular senescence. In the revised manuscript, we performed SA-β-gal staining on fresh frozen samples, revealing a significantly higher number of SA-β-gal positive cells in the gingival tissue of periodontitis, particularly in the lamina propria, while few SA-β-gal positive cells were observed in healthy gingival tissue (Figure. 1A). Additionally, we assessed the protein level changes of H3K9me3, a marker of senescence-associated heterochromatin foci (SAHF), in gingival tissues from healthy individuals and periodontitis patients. The results showed a notable increase in the number of H3K9me3 positive cells in periodontitis tissues, approximately double that found in healthy gingiva (Figure. 1B). This trend aligns with our previous findings of elevated p16 and p21 levels. Collectively, these results further confirm that periodontitis gingival tissues contain a greater number of senescent cells compared to healthy gingiva.  

      (2) Claims on disturbances in lipid metabolism as a driver of CD81+ fibroblast senescence require appropriate functional/mechanistic validations and experiments of metabolism rewiring.

      We sincerely appreciate the reviewers' suggestion for more experimental evidence regarding the role of lipid metabolism in driving CD81+ fibroblast senescence. The influence and mechanisms of lipid metabolism on cellular senescence is a complex and important scientific issue, and it is not the central focus of this manuscript. Therefore, to avoid causing confusion for the reviewers and readers, we have removed the metabolism analysis in the Figure 4-figure supplement 1 and revised the presentation of the relevant results in the revised manuscript to ensure a more rigorous interpretation of our findings (Page 7-8, Line 186-194). 

      (3) Do LPS-stimulated HGFS implementing the senescent programme secrete C3? Detection of complement C3 at the protein level (e.g. by ELISA) would reinforce the proposed mechanism.

      This is indeed a very interesting question. In response to the reviewers' suggestion, we measured the levels of C3 protein secreted by human gingival fibroblasts induced by Pg-LPS, which is one of the markers of the senescence-associated secretory phenotype (SASP). The results indicated that, compared to untreated fibroblasts, those induced by Pg-LPS exhibited significantly higher levels of C3 secretion, approximately 1.5 times that of the control group (Figure. 5G). Additionally, we also found that primary gingival fibroblasts derived from periodontitis tissues secreted more complement C3 compared to those derived from healthy tissues (Figure. 5F). These findings suggest that the increased secretion of complement C3 by gingival fibroblasts in periodontitis tissues may be related to Pg-LPS-induced cellular senescence.

      (4) The mechanism of Metformin to impair senescence and/or the SASP is not fully validated and Metformin can produce other pleiotropic effects. A key experiment (including therapeutic implications) is using a senolytic drug (e.g. Navitoclax) to causally connect the eradication of senescent CD81+ fibroblasts with the recruitment of neutrophils. If the hypothesis of the authors is correct this approach should result in reduced levels of gingival CD81 and C3 positivity, prevention of neutrophils infiltration (reduced MPO positivity), and ameliorate bone damage in ligationinduced periodontitis murine models.

      We fully understand the reviewers' concerns regarding the role of metformin in alleviating cellular senescence and the possibility of it acting through non-senescent pathways. To clarify the role of cellular senescence in the recruitment of neutrophils by CD81+ fibroblasts through C3 in periodontitis, we treated a ligature-induced periodontitis mouse model with ABT-263, also known as Navitoclax. The results showed that after ABT-263 treatment, the number of p16-positive or H3K9me3-positive senescent cells in the periodontitis mice significantly decreased. Additionally, we observed reductions in the quantities of CD81+ fibroblasts, C3 protein levels, neutrophil infiltration, and osteoclasts to varying degrees in the LIP model after ABT263 treatment (Figure. 6). These results further support our hypothesis that the eradication of senescent CD81+ fibroblasts could reduce neutrophil infiltration and alveolar bone resorption. 

      (5) Have the authors considered using any of the available C3/C3aR inhibitors to validate the involvement of neutrophils and the inflammatory response in periodontitis? A C3/C3aR inhibitor would be an elegant treatment group in parallel with the senolytic approach.

      Thank you very much for the reviewers' suggestion to investigate neutrophil infiltration and inflammatory responses after treating periodontitis with C3/C3aR inhibitors. In a clinical study by Hasturk et al. in 2021 (Reference 1), it was found that using the C3 inhibitor AMY-101 effectively alleviated gingival inflammation levels in periodontitis patients. This was reflected in significant decreases in clinical indicators such as the modified gingival index and bleeding on probing, as well as a marked reduction in inflammatory tissue destruction markers, including MMP-8 and MMP-9. In addition, Tomoki Maekawa et al. (Reference 2) demonstrated that a peptide inhibitor of complement C3 effectively reduced inflammation levels and the extent of bone resorption in periodontitis. Moreover, research by Guglietta et al. (Reference 3) clarified that the C3 complement promotes neutrophil recruitment and the formation of neutrophil extracellular traps (NETs) via C3aR. And neutrophil extracellular traps are considered key pathological factors in causing sustained chronic inflammation in periodontitis (References 4 and 5). In summary, existing studies have clearly indicated that C3/C3aR inhibitors likely reduce neutrophil recruitment and inflammation in periodontitis. 

      Reference

      (1) Hasturk, H., Hajishengallis, G., Forsyth Institute Center for Clinical and Translational Research staff, Lambris, J. D., Mastellos, D. C., & Yancopoulou, D. (2021). Phase IIa clinical trial of complement C3 inhibitor AMY-101 in adults with periodontal inflammation. The Journal of clinical investigation, 131(23), e152973.

      (2) Maekawa, T., Briones, R. A., Resuello, R. R., Tuplano, J. V., Hajishengallis, E., Kajikawa, T., Koutsogiannaki, S., Garcia, C. A., Ricklin, D., Lambris, J. D., & Hajishengallis, G. (2016). Inhibition of pre-existing natural periodontitis in non-human primates by a locally administered peptide inhibitor of complement C3. Journal of clinical periodontology, 43(3), 238–249.

      (3) Guglietta, S., Chiavelli, A., Zagato, E., Krieg, C., Gandini, S., Ravenda, P. S., Bazolli, B., Lu, B., Penna, G., & Rescigno, M. (2016). Coagulation induced by C3aR-dependent NETosis drives protumorigenic neutrophils during small intestinal tumorigenesis. Nature communications, 7, 11037.

      (4) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (5) Silva, L. M., Doyle, A. D., Greenwell-Wild, T., Dutzan, N., Tran, C. L., Abusleme, L., Juang, L. J., Leung, J., Chun, E. M., Lum, A. G., Agler, C. S., Zuazo, C. E., Sibree, M., Jani, P., Kram, V., 6 Martin, D., Moss, K., Lionakis, M. S., Castellino, F. J., Kastrup, C. J., … Moutsopoulos, N. M. (2021). Fibrin is a critical regulator of neutrophil effector function at the oral mucosal barrier. Science (New York, N.Y.), 374(6575), eabl5450.

      Other comments

      (1) Figure 1. The authors report upregulation of the aging pathway in bulk RNAseq analyses. What about the upregulation of senescence-related pathways and differential expression of SASP-related genes in this experiment?

      Thanks for this interesting question. Through further analysis of the bulk RNA sequencing results of gingival tissues from LIP mice model, we found significant alterations in multiple senescence-associated secretory phenotype (SASP) genes and several cellular senescencerelated pathways. SASP genes, such as Icam1, Mmp3, Nos3, Igfbp7, Igfbp4, Mmp14, Timp1, Ngf, Il6, Areg, and Vegfa, were markedly upregulated in the periodontitis samples of ligature-induced mice (Figure 1-figure supplement 2A). Moreover, we observed a significant reduction in oxidative phosphorylation levels and the tricarboxylic acid (TCA) cycle in the periodontitis group, suggesting that the occurrence of cellular senescence may be related to mitochondrial dysfunction (Figure 1figure supplement 2B and C.).

      Additionally, we noted the activation of the PI3K-AKT and MAPK pathways in LIP model (Figure 1-figure supplement 2D and E), both of which can induce cellular senescence by activating the tumor suppressor pathway TP53/CDKN1A, leading to cell cycle arrest (References 1, 2). Furthermore, the NF-κB signaling pathway was also significantly enriched in LIP model (Figure 1-figure supplement 2F), which is closely associated with the secretion of SASP factors (Reference 3).

      In summary, our bulk RNA sequencing results suggest enrichment of cellular senescencerelated pathways in the periodontitis group, including mitochondrial metabolic dysregulation, senescence-related pathways, and alterations in the SASP. Related results were added into Page 56 of the revised manuscripts.

      Reference

      (1) Tang Q, Markby GR, MacNair AJ, Tang K, Tkacz M, Parys M, Phadwal K, MacRae VE, Corcoran BM. TGF-β-induced PI3K/AKT/mTOR pathway controls myofibroblast differentiation and secretory phenotype of valvular interstitial cells through the modulation of cellular senescence in a naturally occurring in vitro canine model of myxomatous mitral valve disease. Cell Prolif. 2023 Jun;56(6):e13435. doi: 10.1111/cpr.13435.

      (2) Sayegh S, Fantecelle CH, Laphanuwat P, Subramanian P, Rustin MHA, Gomes DCO, Akbar AN, Chambers ES. Vitamin D3 inhibits p38 MAPK and senescence-associated inflammatory mediator secretion by senescent fibroblasts that impacts immune responses during ageing. Aging Cell. 2024 Apr;23(4):e14093.

      (3) Raynard C, Ma X, Huna A, Tessier N, Massemin A, Zhu K, Flaman JM, Moulin F, Goehrig D, Medard JJ, Vindrieux D, Treilleux I, Hernandez-Vargas H, Ducreux S, Martin N, Bernard D. NF-κB-dependent secretome of senescent cells can trigger neuroendocrine transdifferentiation of breast cancer cells. Aging Cell. 2022 Jul;21(7):e13632.

      (2) I wonder whether the authors could clarify how the semi quantifications for p21, p16, Masson's trichrome, C3, or MPO were done in Figures 1, 2, and 6.

      Thank you very much for the reviewer's suggestion. We have added the semi-quantitative methods for p21, p16, Masson's trichrome, C3, and MPO in the Methods section. Specifically, for semi-quantification of protein expressions, the mean optical density (MOD) of positive stains for p21, p16, and C3 was measured using the ImageJ2 software (version 2.14.0, National Institutes of Health, Bethesda, MD). The number of MPO-positive cells and collagen volume fractions (stained blue) for individual sections were also measured using the ImageJ2 software. (Page 19, Line 537-541 in the revised manuscripts).  

      (3) Figure 2. It is unclear whether N=6 refers to 6 mice, maxilla, or fields per group.

      Thank you very much for the reviewer's question. To avoid any misunderstandings for the reviewer and readers, we have added a definition of the sample size in the description of the micro-CT analysis method. Specifically, in the micro-CT quantitative analysis, the sample size n for each group consists of 6 mice, with the average value of the BV/TV of the bilateral maxillary alveolar bone taken as one sample for statistical analysis (Page 17-18, Line 488-490 in the revised manuscripts).  

      (4)  igure 4K. Please provide separated staining for p16, VIM, and CD81, and not only the Merge. It is difficult to identify the triple-positive cells. Also, the arrows are difficult to observe.

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are indicated with white arrows (Figure 5-figure supplement 1). 

      (5) Overall, improve the magnifications in the IF experiments and show where the magnified areas come from.

      Thank you very much for the reviewer's suggestion. We have enlarged the fluorescence result images.

      (6) Refer to the original datasets of the scRNAseq results in figure legends.

      Thank you very much for the reviewer's suggestion. We have indicated the source of the raw single-cell sequencing data in the figure legend.

      (7) Check English grammar and writing.

      Thank you for the reviewer's suggestion. We checked the grammar and writing in the revised manuscript assisted by a native English speaker and AI tools like Chat-GPT.

      Reviewer #2 (Recommendations For The Authors):

      (1) When the authors refer to accelerated aging and/or senescence, they are doing so in comparison to what?

      Thank you for the reviewer's question, which allows me to further clarify the concepts of accelerated aging and/or senescence. In sections 2.1 and Figure 1 of this manuscript, we referred to accelerated aging and/or senescence. This indicates that the gingival tissues of periodontitis patients exhibit a higher number of senescent cells and elevated levels of senescence-related markers compared to healthy gingival tissues. In the title of this manuscript, we describe CD81+ fibroblasts as a unique subpopulation with accelerated cellular senescence. This means that CD81+ fibroblasts display higher expression levels of senescence-related genes, cell cycle inhibitor p16, and SASP factors compared to other fibroblast subpopulations. To avoid any misunderstanding, we have deleted the text ‘accelerated senescence’ in the revised manuscripts. 

      (2) In general, the main text does not describe the results using exact and reproducible terminology. Phrases like "X was most active", "a significant increase was observed", "the highest proportion was", and "the level of aging increased" should be supported by adding quantification details and by detailing what these comparisons are made to, to improve the reproducibility of the results.

      Thank you for the reviewer's suggestion. To improve the reproducibility of the results, we have added quantification details in the results section and clarified what comparisons are being made through the whole manuscript.

      (3) In some sections of the main text and figure legends, it is not entirely clear which sequencing experiments were conducted by the authors, which analyses were conducted by the authors on publicly available sequencing data, and which analyses were conducted on their mouse sequencing data.

      Thank you for the valuable feedback from the reviewer. To further clarify the source of the sequencing data, we have clearly indicated the data source in both the results section and the figure legends. 

      (4) In Figure 3H, the images showing SA-beta-gal staining on LPS-treated fibroblasts do not show convincingly the difference between treatments that are represented in the graph.

      Thank you for the reviewer's suggestion. To further clearly show the differences between treatments, we have enlarged the partial image of SA-β-gal staining shown in Figure 2-figure supplement 2 of the revised manuscripts. 

      (5) The choice of colors for Figure 4K is far from ideal as it is very difficult to tell apart red from purple channels and thus to visualize triple positive cells. A different LUT should be chosen, and separate individual channels should be shown to clearly identify triple-positive cells from others. Arrows also do not currently point at triple-positive cells.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are marked with white arrows shown in Figure 5-figure supplement 1 of the revised manuscripts.  

      (6) The authors state that treatment with metformin "alleviated.... inflammatory cell infiltration (Figure 2C), and collagen degradation (Figure 2D) as observed through H&E and Masson staining." However, I cannot find a description of how the "relative fraction of collagen" in Figure 2Gc was calculated and how the H&E image they provide shows evidence of a reduction in inflammatory cells at that magnification.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have added details in the methods section regarding the calculation of the "relative fraction of collagen" (Page 19, Line 539-541). Specifically, the collagen volume fractions (stained blue) for individual sections were measured using ImageJ2 software. Additionally, we have marked the infiltrating inflammatory cells in the gingiva in the H&E images with black arrows shown in Figure 7-figure supplement 1B of the revised manuscripts.

      (7) It appears that the in vivo experiment for metformin treatment was conducted with 6 animals per group, but this is not clear in the figures, main text, and methods.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included the number of mice in each group for the in vivo experiments, specifying that there are 6 mice per group in the figures, main text, and methods sections.

      (8) The methodology described for the bulk RNA-sequencing experiment in mice should describe the sequencing library characteristics and some reference to quality control thresholds that were implemented (mapped and aligned reads, sequencing depth and coverage, etc.).

      In the bulk RNA-sequencing experiment, the sequencing library characteristics and quality control thresholds were listed as follows:

      Sequencing Library Characteristics: We utilized the Illumina TruSeq RNA Library Construction Kit, generating libraries with an insert fragment length of approximately 400-500 bp.

      Quality Control Standards include the following:

      Alignment and Mapping Rates: The read data for all samples underwent preliminary quality control using FastQC (v0.11.9) and were aligned using HISAT2 (v2.2.1). The average mapping rate for each sample was over 90%.

      Sequencing Depth and Coverage: Each sample had a sequencing depth of 30M-40M paired reads to ensure sufficient transcript coverage. Detailed alignment statistics have been provided in the supplementary materials.

      Other Quality Control Measures: During the analysis, we also utilized RSeQC (v3.0.1) to evaluate the transcript coverage and GC bias of the sequencing data.

      The corresponding method description and reference were added in the Page 19-20, Line 546-558 of the revised manuscripts.

      (9) Patients with periodontitis are labeled as diagnosed with "chronic periodontitis". I would like to know how the authors defined this chronic state of the disease in their inclusion criteria.

      Thank you very much for the reviewer’s question, which gives us the opportunity to further clarify the definition and diagnosis of chronic periodontitis. The diagnostic criteria for patients with chronic periodontitis in this study are based on the 1999 International Workshop for a Classification of Periodontal Diseases and Conditions (Reference 1). Chronic periodontitis is a type of periodontal disease distinct from aggressive periodontitis, and it is not diagnosed based on the rate of disease progression. Clinically, the diagnosis of chronic periodontitis is primarily based on clinical attachment loss (CAL) ≥ 4 mm or probing depth (PD) ≥ 5 mm as one of the criteria for diagnosis.

      Reference

      (1) Armitage G. C. (2000). Development of a classification system for periodontal diseases and conditions. Northwest dentistry, 79(6), 31–35.

      (10) There is no detail about the age and sex of the donors for the healthy gingival fibroblast experiments. Are they some of the patients mentioned in Supplementary Table 1? Please clarify the source and number of independent primary cultures.

      Thank you very much to the reviewer for allowing us to further clarify the source and number of independent primary cultures. In the cell experiments, we used gingival fibroblasts derived from gingival tissue of two healthy volunteers and two patients with periodontitis as experimental subjects. This information has been listed in the Supplementary Table 1. 

      (11) Can the authors explain why their age inclusion criteria were different for the healthy and periodontitis groups according to their methods (healthy 18-50 years old: periodontitis 18-35 years old?)

      Thank you very much to the reviewer for pointing this out. We noticed that there was an error in the age range indicated for the healthy and periodontitis groups in the inclusion criteria. Based on the original inclusion criteria information, we have corrected the age range of the included population. 18-65 years old individuals were included into the both healthy and periodontitis groups. (Page 14-15, Line 396-404 in the revised manuscripts)

      (12) The methodology for inclusion is confusing and does not reflect the actual information of the recruited patients and samples thus analyzed. In the text, the healthy group appears to have included 8 young adult individuals and 8 middle-aged individuals. However, the list of recruited patients shows all healthy patients were in the young adult range (below 35 years of age) while all chronic periodontitis patients were middle-aged (above 50 years of age). Please clarify.

      Thank you very much to the reviewer for pointing out the issues in the article. This study included 8 healthy periodontal patients and 8 patients with periodontitis (Page 14, Line 396-398 and Supplementary Table 1 in the revised manuscripts). Since periodontitis has a higher prevalence in middle-aged and elderly populations, the periodontitis samples included in this study were mostly from this demographic. In contrast, the healthy gingival samples were sourced from patients undergoing wisdom tooth extraction, which primarily involves younger individuals. Therefore, due to the limited sample size, we could not enforce strict age matching. To address this, we repeated the relevant experiments in more consistent mouse models, which confirmed the increase in senescent cells in periodontal tissues (Figure 1D in the revised manuscripts). In summary, although the clinical samples were limited, the experimental results from the mouse models still support our conclusions.

      (13) The number of biological replicates for each group used in the bulk RNA-sequencing experiment is unclear. The methods state:" For those with biological duplication, we used DESeq2 [8] (version: 1.34.0) to screen differentially expressed gene sets between two biological conditions; for those without biological duplication, we used edgeR". Please clarify the number of mouse samples sequenced and the description of the groups.

      Thank you very much to the reviewer for pointing out the errors in the article. In the transcriptome sequencing, we collected gingival tissues from 3 healthy mice and gingival tissues from 3 ligature-induced periodontitis mice. Therefore, we used the DESeq2 (version: 1.34.0) method to filter for differentially expressed genes. The corresponding descriptions were revised in Page 20, Line 554-555 in the revised manuscripts.

      (14) Cluster group labels are misaligned in Figure 4C.

      Thank you very much for the reviewer's suggestion. The cluster group labels in Figure 3C of the revised manuscripts have been aligned.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments for the Authors:

      (1) I do not find the immunohistochemical staining of p16 and p21 shown in Figures 2E and F to be particularly compelling. Especially as other stains of these markers used later in the manuscript are of higher quality (i.e. Figures 3F and G). Can this staining be improved to better reflect the quantifications in Figure 2G?

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (2) On line 140, Supplementary Figure 2C, D is quoted to show "...an increase in senescence characteristics of fibroblasts with the severity of periodontitis." This figure panel does not appear to support this statement. Please revise.

      Thank you very much for pointing out the errors in the manuscript. In the revised version, we have corrected this part of the description and added that “The results showed a decline in fibroblast proportion along with increasing disease severity (Figure 2-figure supplement 1C and D)” (Page 6, Line 153-154 of the revised manuscript)

      (3) I do not find the Western Blot experiment in Figure 4L to be particularly convincing. The text states that p21, p16, and CD81 increase in a context-dependent manner upon LPS stimulation, which doesn't appear to be very evident. I recommend repeating this experiment and showing both a representative blot alongside a blot density quantification where the bars have the error shown between experiments.

      Thank you very much for the reviewer’s suggestion regarding this result. During subsequent repeated experiments, we found that the result was not reproducible, and we have removed the related results.

      (4) The results state that metabolic profiling of senescent fibroblasts shows an increase in the biosynthesis of Linoleic acid, linolenic acid, arachidonic acid, and steroid. However, in Figure 5B only arachidonic acid and steroid biosynthesis appear to be elevated in CD81+ Fibroblasts, while Linoleic and linolenic acid appear to be decreased. Can the authors comment on this discrepancy? Moreover, in Figure 5C steroid biosynthesis is unchanged between healthy and periodontitis samples, contrary to the claimed increased trend in the results text. Please revise this section. Also, in Figures 5 B and C some of the terms are highlighted in a red or blue box. This is not discussed in the figure legend. Could the significance of this be explained or could these highlights be removed from the figure?

      Thank you very much for the reviewer’s correction regarding the errors in the manuscript. In the Page 7-8, Line 186-194 of the revised manuscripts, “Pathways related to fatty acid biosynthesis, arachidonic acid metabolism, and steroid biosynthesis were significantly upregulated in CD81+ fibroblasts (Figure 4-figure supplement 1A)” was re-wrote. Moreover, we have removed the results from Figure 5C, and the highlights in Figures 5B and C of the previous manuscripts. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (5) The authors state that arachidonic acid can be converted to prostaglandins and leukotrienes through COXs (which are expressed in their CD81+ Fibroblasts), accentuating inflammatory responses. Have the authors profiled for the expression of prostaglandins and leukotrienes in their CD81+ Fibroblasts or between healthy and periodontitis samples? Such data would be a great inclusion in the manuscript.

      Thank you very much for the reviewer’s suggestion. Our results indicated that CD81+ gingival fibroblasts expressed higher levels of PTGS1 and PTGS2 compared to other fibroblast subpopulations. These genes encode proteins that are COX-1 and COX-2, which are key enzymes in prostaglandin biosynthesis (Figure 4-figure supplement 1 of the revised manuscript). Additionally, previous studies have reported high levels of prostaglandins and leukotrienes in periodontal tissues, and these pro-inflammatory mediators contribute to tissue destruction in periodontitis (Reference 1 and 2).

      Reference

      (1) Van Dyke, T. E., & Serhan, C. N. (2003). Resolution of inflammation: a new paradigm for the pathogenesis of periodontal diseases. Journal of dental research, 82(2), 82–90.

      (2) Hikiji, H., Takato, T., Shimizu, T., & Ishii, S. (2008). The roles of prostanoids, leukotrienes, and platelet-activating factor in bone metabolism and disease. Progress in lipid research, 47(2), 107–126.

      (6) Lines 199 and 200 state "...the cellular senescence of CD81+ fibroblasts could be attributed to disturbances in lipid metabolism". While altered lipid metabolic profiles are shown in Figure 5 to correlate with senescent fibroblasts/periodontitis tissue, no evidence is shown to suggest that they are the driver or cause of fibroblast senescence. Could this sentence be amended to better reflect the conclusions that can be drawn from the data presented?

      Thank you very much for the reviewer’s suggestion. We have revised the related statements and believed that “lipid metabolism might play a role in cellular senescence of the gingival fibroblasts” in the Page 7, Line 189 of the revised manuscripts.  

      Minor Comments for the Authors:

      (1) There are some sentences without references that I feel would warrant referencing: - Line 112 - "Metformin, an anti-aging drug has shown potential in inhibiting cell senescence in various disease models (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page10, Line 267-271 of the revised manuscripts.

      Reference

      (1) Soukas, A. A., Hao, H., & Wu, L. (2019). Metformin as Anti-Aging Therapy: Is It for Everyone?. Trends in endocrinology and metabolism: TEM, 30(10), 745–755.

      (2) Kodali, M., Attaluri, S., Madhu, L. N., Shuai, B., Upadhya, R., Gonzalez, J. J., Rao, X., & Shetty, A. K. (2021). Metformin treatment in late middle age improves cognitive function with alleviation of microglial activation and enhancement of autophagy in the hippocampus. Aging cell, 20(2), e13277.

      - Line 210 - "Previous studies have demonstrated the importance of sustained neutrophil infiltration in the progression of periodontitis (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page 8, Line 211-214 of the revised manuscripts.

      Reference

      (1) Song, J., Zhang, Y., Bai, Y., Sun, X., Lu, Y., Guo, Y., He, Y., Gao, M., Chi, X., Heng, B. C., Zhang, X., Li, W., Xu, M., Wei, Y., You, F., Zhang, X., Lu, D., & Deng, X. (2023). The Deubiquitinase OTUD1 Suppresses Secretory Neutrophil Polarization And Ameliorates Immunopathology of Periodontitis. Advanced science (Weinheim, Baden-Wurttemberg, Germany), 10(30), e2303207.

      (2) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (3) Ando, Y., Tsukasaki, M., Huynh, N. C., Zang, S., Yan, M., Muro, R., Nakamura, K., Komagamine, M., Komatsu, N., Okamoto, K., Nakano, K., Okamura, T., Yamaguchi, A., Ishihara, K., & Takayanagi, H. (2024). The neutrophil-osteogenic cell axis promotes bone destruction in periodontitis. International journal of oral science, 16(1), 18.

      (2) To improve the quality of several of the authors' claims I would recommend some further quantification of their experimental analyses. Namely:

      - Figures 3 F and G

      - Figures 4 I, J and K

      - Figures 6 F and G

      - Supplementary Figures 4 A, B, and C

      Thank you for the reviewer's suggestion. We have supplemented the quantitative analysis results for some images based on the reviewer's recommendations, specifically in Figure. 2G, Figure. 3G, Figure 5-figure supplement 1A, B, Figure 5-figure supplement 2A and Figure 7figure supplement 3A-D in the revised manuscripts. 

      (3) Figure 1L has missing x-axis annotation.

      Thank you for the reminder from the reviewer. The X-axis label has been added in Figure 1-figure supplement 1D for the GO term annotation. 

      (4) Line 117 is missing a reference for the experimental schematic shown in Figure 2A.

      Thank you for the reminder from the reviewer. The experimental schematic shown in Figure 7A has been referenced in Page 10, Line 275-277.

      (5) The "BV/TV ratio" and "CEJ-ABC distance" should be briefly explained in the results test (Lines 118 and 119).

      Thank you for the reviewer's suggestion. We have added the explanation of "BV/TV ratio" and "CEJ-ABC distance." In Page 10-11, Line 279-281 in the revised manuscripts.

      (6) Figure 2 could be improved by having some annotation for the anatomical regions shown.

      Thank you for the reviewer’s valuable suggestion. We have labeled the relevant anatomical structures to enhance clarity in Figure 7 in the revised manuscripts. 

      (7) The positive signal for p16 and p21 is difficult to interpret in Figure 2. Could the clarity of this be improved either by using more evident images or annotation with arrowheads indicating positive cells?

      Thank you for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure. 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (8) Figure 2Gc, d, and e are not mentioned in the results text. Please include references to these panels at the appropriate points.

      Thank you for the reminder. In the revised manuscripts, Figures 2G c, d, and e in the previous manuscripts have been mentioned in the text in the Page 11, Line 284-289 of the revised manuscript. 

      (9) Scale bars are missing in Supplementary Figure 2E.

      Thank you for the suggestion. The scale bar has been added in the Figure 7-figure supplement 2B in the revised manuscripts. 

      (10) The order of the figure panels is not always mentioned in the order they are referred to in the text. For example, Figure 3 is presented in the order of A, B, D then C. Could this be changed to reflect the order in the results text?

      Thank you for the feedback. We have renumbered the figures according to the order mentioned in the original manuscript (Page 6, Line 146-149, Figure 2 in the revised manuscripts).

      (11) To improve reader clarity it would be good to briefly introduce the gene expression datasets analysed, such as GSE152042. I.e. what the experimental condition is from which it is derived.

      Thank you for the suggestion. We have included a brief description of the information and sources of the samples from GSE152042 in Page 6, Line 140-142 of the revised manuscripts. 

      (12) To improve reader clarity I would recommend signifying clearly in the figure if the data shown is from mouse or human samples. For example in Figure 3F and G.

      Thank you for the suggestion. We have moved all the results from the mouse experiments to the figures supplement (Figure 5-figure supplement 1 and 2 in the revised manuscripts).

      (13) The images shown in Figure 3H for SA-beta-Gal do not seem very convincing. Could this be improved?

      Thank you for the suggestion. To further illustrate the differences in SA-beta-Gal results between the groups, we have provided images at higher magnification in the Figure 2-figure supplement 2 of the revised manuscripts.  

      (14) Supplementary Figure 2E would benefit from small experimental schematics that would allow the reader to appreciate the timings of the treatment for this experiment.

      Thank you for the suggestion. We have added a schematic diagram in Figure 7-figure supplement 2A of the revised manuscripts to illustrate the LPS treatment, metformin treatment, and the timing of the assessments. 

      (15) Figure 4K would benefit from showing the merged image and single channels of each of the stains to better assess the degree of colocalisation.

      Thank you for the suggestion. We have included each individual fluorescence channel in Figure 5-figure supplement 1C of the revised manuscripts. 

      (16) The writing on the X-axis of Figure 6B is almost illegible to me, although this may just be a compression artefact. This makes the interpretation of the data quite difficult. Also, for Figures 6 B and C, the meaning of the (H) and (P) annotations should be clear on either the figure or figure legend. I surmise that they represent "Healthy" and "Periodontic" samples respectively.

      Thank you for the suggestion. In the revised manuscript, we have enlarged Figure 6B in the previous manuscripts to better display the X-axis as shown in the Figure 5B of the revised manuscripts. Additionally, we have fully labeled "Healthy" and "Periodontitis" in Figure 5C of the revised manuscripts.

      (17) MPO-positive cells are introduced on line 216, however, no explanation is provided for what population or state the expression of this protein marks. I surmise the authors are using it to detect Neutrophil populations. If so, could the authors briefly state this the first time it is used?

      Thank you for the suggestion. In the revised manuscript, we have added an introduction to MPO. MPO, or myeloperoxidase, is considered one of the markers for neutrophils. (Page 9, Line 240-242 of the revised manuscripts)

      (18) Supplementary Figure 3D does not appear to be mentioned or discussed in the results text.

      Thank you for the reminder. We have referenced Supplementary Figure 3D in the previous manuscripts in Page 9, Line 240-242 shown as Figure 5-figure supplement 2C of the revised manuscript.  

      (19) Figure 6E showing increased C3 expression in periodontic samples is not very convincing and differences in expression are not evident. Can the authors provide an image that more convincingly matches their quantification?

      Thank you for the suggestion. In the revised manuscript, we have provided more representative images shown in Figure 5E of the revised manuscript.

      (20) Figure 6I shows the expression of CD81 and SOD2 in healthy and periodontic tissue. The associated results texts (Lines 220 to 223) discuss the spatial coincidence of CD81 and MPO. Can the authors address this discrepancy in either the results text or the figure panel? Moreover, can Figure 6H and I be annotated to show the location of the gingival lamina propria to improve clarity?

      Thank you for the reminder. We have revised the relevant statements in the text: "Interestingly, spatial transcriptomic analysis of gingival tissue revealed that the regions expressing CD81 and SOD2, a neutrophil marker, in periodontitis overlapped in the gingival lamina propria, showing a high spatial correlation" in Page 9, Line 223-226 of the revised manuscripts. Additionally, we have labeled the gingival lamina propria (LP) in Figure 5H of the revised manuscripts.

      (21) I am confused about the purpose of Supplementary Figure 3E and what evidence it provides. Can the authors comment on this?

      Thank you for the reminder. To avoid any potential misunderstanding by readers, we have deleted Supplementary Figure 3 image in the revised manuscripts

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Wang et al show that differentiated peridermal cells of the zebrafish epidermis extend cytoneme-like protrusions toward the less differentiated, intermediate layer below. They present evidence that expression of a dominant-negative cdc42, inhibits cytoneme formation and leads to elevated expression of a marker of undifferentiated keratinocytes, krtt1c19e, in the periderm layer. Data is presented suggesting the involvement of Delta-Notch signaling in keratinocyte differentiation. Finally, changes in expression of the inflammatory cytokine IL-17 and its receptors is shown to affect cytoneme number and periderm structure in a manner similar to Notch and cdc42 perturbations.

      Strengths:

      Overall, the idea that differentiated cells signal to underlying undifferentiated cells via membrane protrusions in skin keratinocytes is interesting and novel, and it is clear that periderm cells send out thin membrane protrusions that contain a Notch ligand. Further, perturbations that affect cytoneme number, Notch signaling, and IL-17 expression clearly lead to changes in periderm structure and gene expression.

      Weaknesses:

      More work is needed to determine whether the effects on keratinocyte differentiation are due to a loss of cytonemes themselves, or to broader effects of inhibiting cdc42. Moreover, more evidence is needed to support the claim that periderm cytonemes deliver Delta ligands to induce Notch signaling below. Without these aspects of the study being solidified, understanding how IL-17 affects these processes seems premature.

      Reviewer #2 (Public Review):

      Summary:

      The aim of the study was to understand how cells of the skin communicate across dermal layers. The research group has previously demonstrated that cellular connections called airinemes contribute to this communication. The current work builds upon this knowledge by showing that differentiated keratinocytes also use cytonemes, specialized signaling filopodia, to communicate with undifferentiated keratinocytes. They show that cytonemes are the more abundant type of cellular extension used for communication between the differentiated keratinocyte layer and the undifferentiated keratinocytes. Disruption of cytoneme formation led to the expansion of the undifferentiated keratinocytes into the periderm, mimicking skin diseases like psoriasis. The authors go on to show that disruption of cytonemes results in perturbations in Notch signaling between the differentiated keratinocytes of the periderm and the underlying proliferating undifferentiated keratinocytes. Further, the authors show that Interleukin-17, also known to drive psoriasis, can restrict the formation of periderm cytonemes, possibly through the inhibition of Cdc42 expression. This work suggests that cytoneme-mediated Notch signaling plays a central role in normal epidermal regulation. The authors propose that disruption of cytoneme function may be an underlying cause of various human skin diseases.

      Strengths:

      The authors provide strong evidence that periderm keratinocytes cytonemes contain the notch ligand DeltaC to promote Notch activation in the underlying intermediate layer to regulate accurate epidermal maintenance.

      Weaknesses:

      The impact of the study would be increased if the mechanism by which Interlukin-17 and Cdc42 collaborate to regulate cytonemes was defined. Experiments measuring Cdc42 activity, rather than just measuring expression, would strengthen the conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Leveraging zebra fish as a research model, Wang et al identified "cytoneme-like structures" as a mechanism for mediating cell-cell communications among skin epidermal cells. The authors further demonstrated that the "cytoneme-like structures" can mediate Notch signaling, and the "cytoneme-like structures" are influenced by IL17 signaling.

      Strengths:

      Elegant zebrafish genetics, reporters, and live imaging.

      Weaknesses: (minor)

      This paper focused on characterizing the "cytoneme-like structures" between different layers and the NOTCH signaling. However, these "cytoneme-like structures" observed in undifferentiated KC (Figure 2B), although at a slightly lower frequency, were not interpreted. In addition, it is unclear if these "cytoneme-like structures" can mediate other signaling pathways than NOTCH.

      We are currently investigating the role of cytoneme-like protrusions extended from undifferentiated keratinocytes and their role is still under investigation. We believe that addressing the function of undifferentiated keratinocyte cytonemes and exploring whether peridermal cytoneme can mediate other signaling pathways is beyond the scope of the current manuscript. However, we hope to publish our discoveries about them soon. It is worth noting that cytonemes mediate other morphogenetic signals, such as Hh, Wnt, Fgf, and TGFbeta in other contexts.

      Overall, this is a solid paper with convincing data reporting the "cytoneme-like structures" in vivo, and with compelling data demonstrating the roles in NOTCH signaling and the regulation by IL17.

      These findings provide a foundation for future work exploring the "cytoneme-like structures" in the mammalian system and other epithelial tissue types. This paper also suggests a potential connection between the "cytoneme-like structures" and psoriasis, which needs to be further explored in clinical samples.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      - In general, representative images from each experiment should accompany the graphs shown. The inclusion of still frames from time-lapse imaging experiments in the main figures would help the reader understand the morphology and dynamics of these protrusions in control, cdc42, and IL-17 manipulations.

      Thank you for the comments. We appreciate your suggestion to include representative images alongside the graphs to better illustrate the morphology and dynamics of these protrusions.

      In response, we have made the following additions to our main figures.

      Figure 3A now includes still images from time-lapse movies for both control and cdc42 manipulations.

      Figure 5A and 6A,C now include still images for il17 manipulations.

      - Data in Figure 3 is crucial as it demonstrates that cdc42DN selectively impairs cytoneme extensions without affecting other actin-based structures. It also shows that cdc42DN leads to upregulation of krtt1c19e in periderm. Therefore, these data should be presented in a comprehensive way. Still, frames of high mag views of time-lapse images from control and cdc42DN should be included in the figure. Similarly, a counter label (E-Cadherin, perhaps) showing the presence of all three layers and goblet cells at different focal planes capturing the different layers of the skin should be included. It is stated that the goblet cell number is unaffected, but they seem to be absent in the image shown in Figure 3B.

      In this revised version, we have included magnified cross-sectional views. In addition to the images of the peridermal layer from the original version, we have now included the underlying intermediate and basal stem cell layers (Figure 3C-C”). We hope these data convincingly show that peridermal keratinocytes in cytoneme inhibited animals co-express krt4 and krtt1c19e markers, suggesting that peridermal keratinocytes are not fully differentiated.

      We agree that the goblet cells in this particular image of experimental group appear largely absent, however, as we quantified many animals, the number of goblet cells was not significantly different between controls and experimental (Figure S2).

      - The effects on periderm architecture upon broad cdc42 inhibition may not be directly due to a loss of cytonemes. Performing this experiment in a mosaic manner to determine if the effects are local and in the range of cytoneme protrusion would strengthen the conclusions. Adding a secondary perturbation to inhibit cytoneme formation in periderm cells would also strengthen the conclusions that defects are not related specifically to cdc42 inhibition, but cytonemes themselves.

      Thank you for the suggestion. We confirmed that mosaic expression of cdc42DN in peridermal keratinocytes elicited local disorganization, and elevated krtt1c19e expression as we seen in transgenic lines. Also, the cdc42DN expressing cells exhibited significantly lower cytoneme extension frequency.

      In addition, we found that like cdc42DN, rac1DN expressing keratinocytes exhibited significant decrease in cytoneme extension frequency, but rhoabDN show no effects (new Figure S3). These data suggest that cytoneme extension is regulated by cdc42 and rac1 but not rhoab. Further investigation is required however, at least these data suggest that the effects we observe is likely the loss of cytonemes not just specifically to cdc42 inhibition.

      - Figure 4. The inclusion of an endogenous reporter of Notch activity, like Hes or Hey immunofluorescence, would strengthen the conclusion that the intermediate layer is Notch responsive.

      Thank you for the suggestion. In this revised version, we have included immunostaining data in Figure 4D demonstrating that Her6 (the orthologous to human HES1) protein is expressed in the intermediate layer.

      - It is not clear where along a differentiation trajectory Notch signaling and cytonemes are needed. What happens to the intermediate layer when Notch signaling or cdc42 is inhibited? Do the cells become more basal-like? Or failing to become periderm? Meaning - is Notch promoting the basal to intermediate fate transition, or the intermediate to periderm transition? A more comprehensive characterization of basal, intermediate, and periderm differentiation with markers selective to each layer would help define which step in the process is being altered.

      Notch signaling is known to regulate keratinocyte terminal differentiation. Thus, it requires in the process from intermediate to peridermal transition. We observed peridermal keratinocytes still strongly express krt19 suggesting their terminal differentiation is inhibited when cytoneme mediated Notch signaling is compromised.

      As seen on Figure 3C”, peridermal keratinocytes express both krt4 and krtt1c19e markers and they are located at the peridermal layer suggesting that they are not fully differentiated keratinocytes. As we included the images of intermediate and basal layers, we do not observe any noticeable defects in basal stem cells or complete depletion of intermediate keratinocytes (Fig 3C-C”). These observations suggest that notch signaling, activated by cytonemes, is required for the differentiation of undifferentiated intermediate keratinocytes into peridermal keratinocytes.

      We included this interpretation in the main text.

      - A number of times in the text it is suggested that cytonemes, Notch, and IL-17 signaling are essential for keratinocyte differentiation and proliferation, but proliferation (% cells in S-phase and M-phase) is not measured. Also, #of keratinocytes @ periderm is not an accurate way to report the number of cells in the periderm unless every cell in the larvae has been counted. It should be # cells/unit area.

      In this revised version, we confirmed that the number of Edu+ cells among peridermal keratinocytes are significantly increased when cytonemes are inhibited (Figure 3F-G). Also, as indicated in the methods section, we indeed counted the cells in 290um x 200um square. We believe both of the data sufficiently suggest that the number of keratinocytes in periderm is significantly increased due to the lack of proper cytoneme mediated signaling.

      - If the model is correct that Delta ligands from the periderm signal to intermediate cells to promote their differentiation and inhibit their proliferation, then depletion of Delta from Krt4 expressing cells should recapitulate the periderm phenotype.

      It is a great suggestion. However, zebrafish skin express multiple delta ligands and we do not know what specific combination of Deltas are delivered via cytonemes. In this manuscript we identified Dlc is expressed along the cytonemes and krt4+ cells (revised Figure S4), however we are unsure whether other Delta ligands involve the notch activation. However, cytoneme inhibition is performed specifically in krt4+ cells and the downregulation of Notch activation are observed in krtt1c19e+ undifferentiated keratinocytes. In this revised version, we found that a Notch responsive protein Her6 is exclusively expressed in the cytoneme target keratinocytes, and cytoneme extending cells (krt4+) do not express Notch receptors.

      - rtPCR data in Figure S3 is not properly controlled. Each gene should be tested in both krt4 and krtt1c19e expressing cells to determine their relative expression levels in different skin layers that are proposed to signal to one another. Are Notch ligands present in basal cells? These could be activating Notch in the intermediate layer.

      Our intention was to merely confirm the Notch signaling components are expressed in cytoneme extending and receiving cells. Based on the new panel of RT-PCRs for notch signaling components, we confirmed again that dlc is expressed in cytoneme extending cells but not in receiving cells. Basal cells are also krtt1c19e+ but we did not detect dlc from them. Interestingly, we found that notch 2 is exclusively expressed in krtt1c19e+ cells but not from krt4+ cytoneme extending cells (now new Figure S4).

      - It is not intuitive why NICD (activation) and SuHDN (inhibition) of Notch signaling should result in a similar effect on the periderm. What is the effect of NICD expression on the TP1:H2BGFP reporter? Does it hyperactivate as expected?

      We agree reviewer’s concerns. It is well studied that psoriasis patients exhibits either loss or gain of notch signaling (Ota et al., 2014 Acta Histochecm Cytochem, Abdou et al., 2012 Annals of Diagnostic Pathology). However, it remains unknown the underlying mechanisms. We merely intended to showcase our zebrafish experimental manipulations recapitulate human patients’ case. However, we believe this data doesn’t require for drawing the overall conclusion but need further investigation to explain it. Thus, if the reviewers agree we want to omit it in this manuscript and leave it for future studies.

      - Due to the involvement of immune signaling in hyperproliferative skin diseases the paper then investigates the role of IL-17 on cytoneme formation by overexpressing two IL-17 receptors in the periderm. Fewer cytonemes were present in the receptor over-expressing periderm cells. The rationale for overexpressing the receptors was unclear. If relevant to endogenous cytokine signaling, the periderm would be expected to express IL-17 receptors normally and respond to elevated levels of IL-17.

      The rationale behind the reason of why we overexpress the IL-17 receptors is to test its autonomy of krt4+ peridermal cells. There is a debate that whether the onset of psoriasis is autonomous to keratinocytes or non-autonomous effects of immune malfunction. In addition to the overexpression of IL-17 receptors, we showed that the IL-17 ligand overexpression shows the sample effects on cytoneme extension (Fig. 6A-B).

      - Experiments overexpressing IL-17 in macrophages are also suggested to limit cytoneme number whereas heterozygous deletion elevates them. Representative images and movies should be included to support the data. Western blots or immunofluorescence showing that IL-17 and its receptors are indeed overexpressed in the relevant layers/cell types should also be included as controls. Knockout of IL-17 protein in the new Crispr deletion mutant should also be shown.

      In response to the reviewer’s comments, we have included representative images of peridermal keratinocytes in IL-17 ligand overexpressed and il17 CRISPR KO animals (Fig. 6A,C).

      We have confirmed the overexpression of Il17rd, Il17ra1a and Il17a in the transgenic animals. For the il17 receptors, we FACS-sorted differentiated keratinocytes and performed qRT-PCR. Similarly, for the il17 ligand, we isolated skin tissue and conducted qRT-PCR (new Figure S7).

      Additionally, we confirmed that IL-17 protein expression is undetectable in il17a CRISPR KO fish (Fig. S8C).

      - Evidence that the effect of IL-17 upregulation on periderm architecture is via cytonemes is suggestive but not conclusive. Can the phenotype be rescued by a constitutively active cdc42?

      We appreciate the reviewer’s suggestion. We are unsure whether constitutively active cdc42 expression can rescue IL-17 overexpression mediated reduction of cytoneme extension frequency. It is well expected that cdc42CA will stabilize actin polymerization in turn more cytonemes. However, it is also known sustained cdc42 activation can paradoxically lead to actin depolymerization. Thus, we concern it will be likely uninterpretable. Also, we need to generate a new transgenic line for this experiment and the baseline control experiments and validations take substantial amount of time and efforts with no confidence.

      We and others believe that the cdc42 is a final effector molecule to regulate cytoneme extension given its role in actin polymerization. we provided the evidence that IL-17 overexpression significantly reduced cdc42 and rac1 expression (Figure 6E) and co-manipulation with IL17 overexpression and cdc42DN led to further down-regulation of cytoneme extension frequency in peridermal keratinocytes (Figure 6H).

      - In a final experiment, the authors mutate a psoriasis-associated gene, clint1a gene and show an effect on cytonemes, Notch output, and periderm structure. More information about what this gene encodes, where the mRNA is expressed, and where the cell the protein should localize would help place this result in context for the reader.

      In this revised manuscript we included more information about the clint1.

      “The clathrin interactor 1 (clint1), also referred to as enthoprotin and epsinR functions as an adaptor molecule that binds SNARE proteins and play a role in clathrin-mediated vasicular transport (Wasiak, 2002). It has also been reported that clint1 is expressed in epidermis and play an important role in epidermal homeostasis and development in zebrafish (Dodd et al., 2009)”.

      Minor points

      - The architecture of zebrafish skin is notably distinct from that of humans and other mammals and whether parallels can be drawn with regards to cytoneme mediated signaling requires further investigation. For this reason, I believe the title should include the words 'in zebrafish skin'.

      In this version, we changed the title as ‘Cytoneme-mediated intercellular signaling in keratinocytes essential for epidermal remodeling in zebrafish’.

      - More details about the timing of cdc42 inhibition should be given in the main text to interpret the data. How many hours of days are the larvae treated? How does this compare to the rate of division and differentiation in the zebrafish larval epidermis?

      We apologize for omitting the detailed experimental conditions for cytoneme inhibition. We have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

      - What are the genotypes of animals in Figure 4B where 'Notch expression' is being measured upon cdc42DN inhibition? Is this the TP1:H2B-GFP reporter? Again, details of the timing of this experiment are needed to evaluate the results.

      We indicated the reference supplement figure for the Notch activity measure in the figure legend S4. And we added the following sentence in the main text. “Similar to the effects on the epidermis after cytoneme inhibition (Figure 3), it takes 3 days to observe a significantly reduction in Notch signal in the undifferentiated keratinocytes.”

      Reviewer #2 (Recommendations For The Authors):

      - Figure 2B: the authors indicate that the undifferentiated keratinocytes (krtt1c19e+) do extend some cytonemes. Although this behavior is not a focus of the study, it would be helpful to see an image of krtt1c19e:lyn-tdTomato cytonemes. The discussion ends with an interesting statement about downward pointed protrusions coming off the undifferentiated keratinocytes. A representative image of this should be included in Figure 2.

      In this revised version, we included an image of krtt1c19e positive cell that extend cytonemes in Figure 2C.

      - The evidence for hyperproliferation of the undifferentiated keratinocytes would be strengthened by quantifying proliferation. Most experiments result in increased expression of krtt1c19e in the periderm layer, but it is unclear whether this is invasion, remodeling, or incomplete differentiation of the cells. Notch suppression with krtt1c19e:SuHDN and overactivation with krtt1c19e:NICD phenocopy each other. Are there differences in proliferation vs differentiation rates in these two genotypes that result in a similar phenotype?

      We appreciate the reviewer’s comments. In response to the feedback, we included Edu experiments that show increased cell proliferation in keratinocytes in periderm in experimental groups. Additionally, we observed co-expressed of both differentiated marker krt4 and undifferentiated marker krtt1c19e in the keratinocytes in periderm. Since we did not observe depletion of intermediate layer, we believe it is reasonable to conclude that the phenotype represents incomplete differentiation (new Figure 3). For the krtt1c19e:NICD question, please refer to our response to reviewer #1’ comment.

      - Do Cdc42DN and il17rd or il17ra1a work in parallel or in a hierarchy of signaling events to regulate cytoneme formation?

      Cdc42 is widely recognized as a final effector in cytoneme extension, given its well-established role in actin polymerization, which is critical for cytoneme extension. Our data support a model where il17 signaling acts upstream of cdc42. We showed that the overexpression of il17rd or il17ra1a significantly reduced the expression of Cdc42 (Figure 6E). In double transgenic fish overexpressing il17rd and cdc42DN, we observed a more marked decrease in cytoneme extension compared to single transgenic (Figure 6H). These results collectively indicate that, at least partially, Cdc42 functions downstream of il17 signaling in the context of cytoneme formation. However, we acknowledge that additional regulatory mechanisms may be involved, given the complexity of cellular signaling networks.  

      - Figure 6C: Are the effects of overexpression of il17rd specific to Cdc42, or are other Rho family GTPases like Rac and Rho also affected? Is the microridge defect (Figure 6D) also present in Tg(krt4:TetGBDTRE-v2a-cdc42DN) when induced, or could this be regulated by Rho/Rac?

      We used the microridge formation as a readout to evaluate the effects of il17receptor overexpression on actin polymerization. In this revision, we demonstrate that the expression of other small GTPases is also decreased in il17rd or il17ra1a overexpressed keratinocytes (Figure 6E). Also, we confirmed that microridges exhibit significantly shorter branch length when cdc42DN or rac1DN is overexpressed (new Figure S9). It is note that we have shown that the effects on cytonemes are regulated by cdc42 and rac1 (new Figure S3).

      - Please change the color of the individual data points from black to grey or another color so readers may better visualize the mean and error bars.

      We agree with this comment, and in response, we have revised the figures by changing the color of the individual data points to empty circles and now the error bars are better visualized.

      - Figure 1: What were the parameters used to identify an extension as a cytoneme? Please include the minimal length and max-width used in the analysis in the methods.

      Thank you for the comments. We have now included the method of how we defined cytonemes and measured as follows. In zebrafish keratinocytes, lamellipodial extensions are the dominant extension type, and most filopodial extensions are less than 1µm in length, both are not easily visible at the confocal resolution we used for this study. Thus, it is easy to distinguish filopodia from cytonemes, as cytonemes have a minimum length of 4.36µm in our observations. We did not use the width parameter since there are no other protrusions except cytonemes. We calculated the cytoneme extension frequency by counting how many cytonemes extended from a cell per hour. We analyzed movies with 3-minute intervals over a total of 10 hours, as described in the section above.

      - Line 149-150, (Figure S1) ML141 is a Cdc42 inhibitor, please correct the wording. Would the use of an actin polymerization inhibitor like Cytochalasin B or a depolymerizing agent (Latrunculin) increase the reduction in cytoneme formation?

      Thank you for pointing it out. We have revised it in this version. We have tried Cytochalasin B or Latrunculin and the treatments killed the animals.

      - Figure 2: What is the depth of the Z-axis images? Does the scale bar apply to the cross-sectional images as well? It may be beneficial to readers to expand the Z scale of the cross-section images for Figure 2C.

      Sure, we enlarged the cross-sectional images. Yes, the scale bar should apply to the cross-sectional images.

      - Figure 3B-B' cross-section images should be added to confirm images shown represent the periderm layer. Are there folds in the epidermis due to cdc42DN expression or are differentiated keratinocytes absent?

      In response, we have included z-stack images in the revised figure 3. We found that the epidermal tissue is not flat as compared to controls, presumably due to broad cdc42DN expression (Figure 3C”).

      - Figure S3: Do the EGFP+ and tdTomato+ cells have noticeable differential gene expression? The inclusion of RT-PCR analysis of all genes analyzed for both cell populations would bolster statements on lines 230-231 and 254-256.

      We agree the reviewer’s comment and we have revised the RT-PCR panel in this revised version (Figure S4).

      - Figure 4D-D', Please include cross-section images to indicate the focal plane for analysis.

      We included cross-section images in this revised version (Figure 4E-E”).

      - Figure 5B: Complimentary images visualizing the reduction of Notch would be helpful.

      We are sorry not to include the data. In this revised version, we included notch reporter expression data that comparing WT, Tg(krt4:il17rd), and Tg(krt4:il17ra1a) in Figure S5E.

      - Line 432-433: "Moreover, we have demonstrated that IL-17 can influence cytoneme extension by regulating Cdc42 GTPases, ultimately affecting actin polymerization." This claim would be strengthened by assaying for Cdc42 activity.

      It is a great idea, and we were trying to address this issue. However, we realized that activity measure with biosensors, especially in vivo, required significant amount of time and effort and validations which seem to take a substantial amount of work needed, and no confidence to work in our end. And, it seems the current methods works for in vitro samples still has many limitations such as sensitivity issues. Although, we agree cdc42 activity measure will bolster our findings, it seems very challenging to apply it to zebrafish in vivo system.

      - Line 445-447: "Clint1(Clathrin Interactor 1) plays an important role in vesicle trafficking, and it is well established that endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery (Kalthoff et al., 2002)." Please add references to the "endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery" portion of the sentence.

      We revised the sentence. It is “well established” -> it is “suggested”, and added a reference (Daly et al., 2022).

      Reviewer #3 (Recommendations For The Authors):

      The details of the "cytoneme inhibition" experiments need to be better clarified. How long was the dox treatment? How soon did the cells start to show "disorganization"? How soon did the KC in the periderm start to show increased proliferation?

      Thank you for the valuable comment and in response, we have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with _mis_HCR and _mis_HCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to _mis_HCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction of nanobody and peptide ligation technology is a key highlight of this study. To strengthen the manuscript, the authors should provide a more detailed discussion of the principles and applications of HCR in the Introduction or Discussion sections.

      We have added a brief discussion of the HCR reaction to the revised manuscript.

      (2) It would also be beneficial to include results and/or discussion on how the affinity of nanobody binding to IgG influences the success and accuracy of the technique.

      We have added a brief discussion of the IgG nanobodies we used in MaMBA to the revised manuscript.

      (3) Additionally, a more detailed explanation of the recognition specificity of the AEP peptide ligase used in this study should be included in the Discussion section. Prior studies have reported on the specificity of amino acid residues positioned at the C-terminus of target A (-5 to -1) and the N-terminus of target B (1 to 3) in AEP-mediated ligation, and integrating this context would enhance clarity.

      We have added a brief discussion of the AEP-mediated ligation to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment 

      The authors utilize a valuable computational approach to exploring the mechanisms of memorydependent klinotaxis, with a hypothesis that is both plausible and testable. Although they provide a solid hypothesis of circuit function based on an established model, the model's lack of integration of newer experimental findings, its reliance on predefined synaptic states, and oversimplified sensory dynamics, make the investigation incomplete for both memory and internal-state modulation of taxis.  

      We would like to express our gratitude to the editor for the assessment of our work. However, we respectfully disagree with the assessment that our investigation is incomplete, if the negative assessment is primarily due to the impact of AIY interneuron ablation on the chemotaxis index (CI) which was reported in Reference [1]. It is crucial to acknowledge that the CI determined through experimental means incorporates contributions from both klinokinesis and klinotaxis [1]. It is plausible that the impact of AIY ablation was not adequately reflected in the CI value. Consequently, the experimental observation does not necessarily diminish the role of AIY in klinotaxis. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. These findings provide substantial evidence supporting the validity of the presented minimal neural network responsible for salt klinotaxis.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This research focuses on C. elegans klinotaxis, a chemotactic behavior characterized by gradual turning, aiming to uncover the neural circuit mechanism responsible for the context-dependent reversal of salt concentration preference. The phenomenon observed is that the preferred salt concentration depends on the difference between the pre-assay cultivation conditions and the current environmental salt levels. 

      We would like to express our gratitude for the time and consideration you have dedicated to reviewing our manuscript.

      The authors propose that a synaptic-reversal plasticity mechanism at the primary sensory neuron, ASER, is critical for this memory- and context-dependent switching of preference. They build on prior findings regarding synaptic reversal between ASER and AIB, as well as the receptor composition of AIY neurons, to hypothesize that similar "plasticity" between ASER and AIY underpins salt preference behavior in klinotaxis. This plasticity differs conceptually from the classical one as it does not rely on any structural changes but rather synaptic transmission is modulated by the basal level of glutamate, and can switch from inhibitory to excitatory. 

      To test this hypothesis, the study employs a previously established neuroanatomically grounded model [4] and demonstrates that reversing the ASER-AIY synapse sign in the model agent reproduces the observed reversal in salt preference. The model is parameterized using a computational search technique (evolutionary algorithm) to optimize unknown electrophysiological parameters for chemotaxis performance. Experimental validity is ensured by incorporating constraints derived from published findings, confirming the plausibility of the proposed mechanism. 

      Finally. the circuit mechanism allowing C. elegans to switch behaviour to an exploration run when starved is also investigated. This extension highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      We would like to thank the reviewer for the appropriate summary of our work. 

      Strengths and weaknesses: 

      The authors' approach of integrating prior knowledge of receptor composition and synaptic reversal with the repurposing of a published neuroanatomical model [4] is a significant strength. This methodology not only ensures biological plausibility but also leverages a solid, reproducible modeling foundation to explore and test novel hypotheses effectively.

      The evidence produced that the original model has been successfully reproduced is convincing.

      The writing of the manuscript needs revision as it makes comprehension difficult.  

      We would like to thank the reviewer for recognizing the usefulness of our approach. In the revised version, we improved the explanation according to your suggestions.  

      One major weakness is that the model does not incorporate key findings that have emerged since the original model's publication in 2013, limiting the support for the proposed mechanism. In particular, ablation studies indicate that AIY is not critical for chemotaxis, and other interneurons may play partially overlapping roles in positive versus negative chemotaxis. These findings challenge the centrality of AIY and suggest the model oversimplifies the circuit involved in klinotaxis.

      We would like to express our gratitude for the constructive feedback we have received. We concur with some of your assertions. In fact, our model is the minimal network for salt klinotaxis, which includes solely the interneurons that are connected to each other via the highest number of synaptic connections. It is important to note that our model does not consider redundant interneurons that exhibit overlapping roles. Consequently, the model is not applicable to the study of the impact of interneuron ablation. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. The experimentally determined CI value incorporates the contributions from both klinokinesis and klinotaxis. Consequently, it is plausible that the impact of AIY ablation was not significantly reflected in the CI value. The experimental observation does not necessarily diminish the role of AIY in klinotaxis. 

      Reference [1] also shows that ASER neurons exhibit complex, memory- and context-dependent responses, which are not accounted for in the model and may have a significant impact on chemotactic model behaviour. 

      As the reviewer has noted, our model does not incorporate the context-dependent response of the ASER. Instead, the impact of the salt concentration-dependent glutamate release from the ASER [S. Hiroki et al. Nat Commun 13, 2928 (2022)] as the result of the ASER responses was in detail examined in the present study.

      The hypothesis of synaptic reversal between ASER and AIY is not explicitly modeled in terms of receptor-specific dynamics or glutamate basal levels. Instead, the ASER-to-AIY connection is predefined as inhibitory or excitatory in separate models. This approach limits the model's ability to test the full range of mechanisms hypothesized to drive behavioral switching.  

      We would like to express our gratitude to the reviewer for their constructive feedback. As you correctly noted, the hypothesized synaptic reversal between ASER and AIY is not explicitly modeled in terms of the sensitivity of the receptors in the AIY and the glutamate basal levels by the ASER. On the other hand, in the present study, under considering a substantial difference in the sensitivity of the two glutamate receptors on the AIY, we sought to endeavored to elucidate the impact of salt-concentration-dependent glutamate basal levels on klinotaxis. To this end, we conducted a comprehensive examination of the full range gradual change in the ASER-to-AIY connection from inhibitory to excitatory, as illustrated in Figures S4 and S5.

      While the main results - such as response dependence on step inputs at different phases of the oscillator - are consistent with those observed in chemotaxis models with explicit neural dynamics (e.g., Reference [2]), the lack of richer neural dynamics could overlook critical effects. For example, the authors highlight the influence of gap junctions on turning sensitivity but do not sufficiently analyze the underlying mechanisms driving these effects. The role of gap junctions in the model may be oversimplified because, as in the original model [4], the oscillator dynamics are not intrinsically generated by an oscillator circuit but are instead externally imposed via $z_¥text{osc}$. This simplification should be carefully considered when interpreting the contributions of specific connections to network dynamics. Lastly, the complex and contextdependent responses of ASER [1] might interact with circuit dynamics in ways that are not captured by the current simplified implementation. These simplifications could limit the model's ability to account for the interplay between sensory encoding and motor responses in C. elegans chemotaxis. 

      We might not understand the substance of your assertions. However, we understand that the oscillator dynamics were not intrinsically generated by the oscillator neural circuit that is explicitly incorporated into our modeling. On the other hand, the present study focuses on how the sensory input and resulting interneuron dynamics regulate the oscillatory behavior of SMB motor neurons to generate klinotaxis. The neuron dynamics via gap junctions results from the equilibration of the membrane potential yi of two neurons connected by gap junctions rather than the zi. We added this explanation in the revised manuscript as follows.

      “The hyperpolarization signals in the AIZL are transmitted to the AIZR via the gap junction (Figs. S1d and S1f and Fig. 3d). This is because the neuron dynamics via gap junctions results from the equilibration of the membrane potential y<sub>i</sub> of two neurons connected by gap junctions rather than the z<sub>i</sub>.”

      In the limitation, we added the following sentence:

      “In the present study, the oscillator components of the SMB are not intrinsically generated by an oscillator circuit but are instead externally imposed via 𝑧<sub>i</sub><sup>OSC</sup>. Furthermore, the complex and context-dependent responses of ASER {Luo:2014et} were not taken into consideration. It should be acknowledged as a limitation of this study that these omitted factors may interact with circuit dynamics in ways that are not captured by the current simplified implementation.”

      Appraisal: 

      The authors show that their model can reproduce memory-dependent reversal of preference in klinotaxis, demonstrating that the ASER-to-AIY synapse plays a key role in switching chemotactic preferences. By switching the ASER-AIY connection from excitatory to inhibitory they indeed show that salt preference reverses. They also show that the curving/turn rate underlying the preference change is gradual and depends on the weight between ASER-AIY. They further support their claim by showing that curving rates also depend on cultivated (set-point).  

      We would like to thank the reviewer for assessing our work.

      Thus within the constraints of the hypothesis and the framework, the model operates as expected and aligns with some experimental findings. However, significant omissions of key experimental evidence raise questions on whether the proposed neural mechanisms are sufficient for reversal in salt-preference chemotaxis.  

      We agree with your opinion. The present hypothesis should be verified by experiments.

      Previous work [1] has shown that individually ablating the AIZ or AIY interneurons has essentially no effect on the Chemotactic Index (CI) toward the set point ([1] Figure 6). Furthermore, in [1] the authors report that different postsynaptic neurons are required for movement above or below the set point. The manuscript should address how this evidence fits with their model by attempting similar ablations. It is possible that the CI is rescued by klinokinesis but this needs to be tested on an extension of this model to provide a more compelling argument.  

      We would like to express our gratitude for the constructive feedback we have received. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. It is important to acknowledge that the experimentally determined CI value encompasses the contributions of both klinokinesis and klinotaxis. It is plausible that the impact of AIY ablation was not reflected in the CI value. Consequently, these experimental observations do not necessarily diminish the role of AIY in klinotaxis. The neural circuit model employed in the present study constitutes a minimal network for salt klinotaxis, encompassing solely interneurons that are connected to each other via the highest number of synaptic connections. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/cceptool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. Our model does not take into account redundant interneurons with overlapping roles, thus rendering it not applicable to the study of the effects of interneuron ablation.

      The investigation of dispersal behaviour in starved individuals is rather limited to testing by imposing inhibition of the SMB neurons. Although a circuit is proposed for how hunger states modulate taxis in the absence of food, this circuit hypothesis is not explicitly modelled to test the theory or provide novel insights.  

      As the reviewer noted, the experimentally identified neural circuit that inhibits the SMB motor neurons in starved individuals is not incorporated in our model. Instead of incorporating this circuit explicitly, we examined whether our minimal network model could reproduce dispersal behavior under starvation conditions solely due to the experimentally demonstrated inhibitory effect of SMB motor neurons.

      Impact: 

      This research underscores the value of an embodied approach to understanding chemotaxis, addressing an important memory mechanism that enables adaptive behavior in the sensorimotor circuits supporting C. elegans chemotaxis. The principle of operation - the dependence of motor responses to sensory inputs on the phase of oscillation - appears to be a convergent solution to taxis. Similar mechanisms have been proposed in Drosophila larvae chemotaxis [2], zebrafish phototaxis [3], and other systems. Consequently, the proposed mechanism has broader implications for understanding how adaptive behaviors are embedded within sensorimotor systems and how experience shapes these circuits across species.

      We would like to express our gratitude for useful suggestion. We added this argument in Discussion of the revised manuscript as follows.    

      “The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, appears to be a convergent solution for taxis and navigation across species. In fact, analogous mechanisms have been postulated in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. Consequently, the synaptic reversal mechanism highlighted in this study offers the framework for understanding how the behaviors that are adaptive to the environment are embedded within sensorimotor systems and how experience shapes these neural circuits across species.”

      Although the reported reversal of synaptic connection from excitatory to inhibitory is an exciting phenomenon of broad interest, it is not entirely new, as the authors acknowledge similar reversals have been reported in ASER-to-AIB signaling for klinokinesis ( Hiroki et al., 2022). The proposed reversal of the ASER-to-AIY synaptic connection from inhibitory to excitatory is a novel contribution in the specific context of klinotaxis. While the ASER's role in gradient sensing and memory encoding has been previously identified, the current paper mechanistically models these processes, introducing a hypothesis for synaptic plasticity as the basis for bidirectional salt preference in klinotaxis.  

      The research also highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      The methodology of parameter search on a neural model of a connectome used here yielded the valuable insight that connectome information alone does not provide enough constraints to reproduce the neural circuits for behaviour. It demonstrates that additional neurophysiological constraints are required.  

      We would like to acknowledge the appropriate recognition of our work.

      Additional Context 

      Oscillators with stimulus-driven perturbations appear to be a convergent solution for taxis and navigation across species. Similar mechanisms have been studied in zebrafish phototaxis [3], Drosophila larvae chemotaxis [2], and have even been proposed to underlie search runs in ants. The modulation of taxis by context and memory is a ubiquitous requirement, with parallels across species. For example, Drosophila larvae modulate taxis based on current food availability and predicted rewards associated with odors, though the underlying mechanism remains elusive. The synaptic reversal mechanism highlighted in this study offers a compelling framework for understanding how taxis circuits integrate context-related memory retrieval more broadly.  

      We would like to express our gratitude for the insightful commentary. In the revised manuscript, we incorporated the argument that the similar oscillator mechanism with stimulus-driven perturbations has been observed for zebrafish phototaxis [3] and Drosophila larvae chemotaxis [2] into Discussion.

      As a side note, an interesting difference emerges when comparing C. elegans and Drosophila larvae chemotaxis. In Drosophila larvae, oscillatory mechanisms are hypothesized to underlie all chemotactic reorientations, ranging from large turns to smaller directional biases (weathervaning). By contrast, in C. elegans, weathervaning and pirouettes are treated as distinct strategies, often attributed to separate neural mechanisms. This raises the possibility that their motor execution could share a common oscillator-based framework. Re-examining their overlap might reveal deeper insights into the neural principles underlying these maneuvers. 

      We would like to acknowledge your thoughtfully articulated comment. As the reviewer pointed out, the anatomical database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) shows that that the neural circuits underlying weathervaning and pirouettes in C. elegans are predominantly distinct but exhibit partial overlap. When we restrict our search to the neurons that are connected to each other with the highest number of synaptic connections, we identify the projections from the neural circuit of weathervaning to the circuit of pirouettes; however we observed no reversal projections. This finding suggests that the neural circuit of weathervaning, namely, our minimal neural network, is not likely to be affected by that of pirouettes, which consists of AIB interneurons and interneurons and motor neurons the downstream. 

      (1) Luo, L., Wen, Q., Ren, J., Hendricks, M., Gershow, M., Qin, Y., Greenwood, J., Soucy, E.R., Klein, M., Smith-Parker, H.K., & Calvo, A.C. (2014). Dynamic encoding of perception, memory, and movement in a C. elegans chemotaxis circuit. Neuron, 82(5), 1115-1128. 

      (2) Antoine Wystrach, Konstantinos Lagogiannis, Barbara Webb (2016) Continuous lateral oscillations as a core mechanism for taxis in Drosophila larvae eLife 5:e15504. 

      (3) Wolf, S., Dubreuil, A.M., Bertoni, T. et al. Sensorimotor computation underlying phototaxis in zebrafish. Nat Commun 8, 651 (2017). 

      (4) Izquierdo, E.J. and Beer, R.D., 2013. Connecting a connectome to behavior: an ensemble of neuroanatomical models of C. elegans klinotaxis. PLoS computational biology, 9(2), p.e1002890. 

      Reviewer #2 (Public review): 

      Summary: 

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.  

      We would like to express our gratitude for the time and consideration the reviewer has dedicated to reviewing our manuscript.

      Strengths: 

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      We would like to thank the reviewer for appreciating our research. 

      Weaknesses: 

      While the model successfully replicates some behaviors observed in previous experiments, many key assumptions lack direct biological validation. As to the model output readouts, the model considers only endpoint behaviors (chemotaxis index) rather than the full dynamics of navigation, which limits its predictive power. Moreover, some results presented in the paper lack interpretation, and many descriptions in the main text are overly technical and require clearer definitions.  

      We would like to thank the reviewer for the constructive feedback. As the reviewer noted, the fundamental assumptions posited in the study have yet to be substantiated by biological validation, and consequently, these assumptions must be directly assessed by biological experimentation. The model performance for salt klinotaxis has been evaluated by multiple factors, including not only a chemotaxis index but also the curving rate vs. bearing (Fig. 4a, the bearing is defined in Fig. A3) and the curving rate vs. normal gradient (Fig. 4c). These two parameters work to characterize the trajectory during salt klinotaxis. In the revised version, we meticulously revised the manuscript according to the reviewer’s suggestions. We would like to express our sincere gratitude for your insightful review of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      An interesting and engaging methodology combining theoretical and computational approaches. Overall I found the manuscript up to discussion a difficult read, and I would suggest revising it. I would also recommend introducing the general operating principle of the oscillator with sensory perturbations before jumping into the implementation details of signal propagation specific to C.

      elegans.  

      In order to elucidate the relation between the general operating principle of the oscillator with sensory perturbations and the results shown by the two graphs from the bottom in Fig. 3d, the following statement was added on page 12.

      “It is remarkable that this regulatory mechanism derived via the optimization of the CI has been observed in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, therefore, may serve as a convergent solution for taxis and navigation across species.”

      The abstract could benefit from a clarification of terms to benefit a broader audience:  The term "salt klinotaxis" is used without prior introduction or definition. It would be beneficial to briefly explain this term, as it may not be familiar to all readers. 

      Due to the limitation of the word number in the abstract, the explanation of salt klinotaxis could not be included.

      Although ASER is introduced as a right-side head sensory neuron, AIY neurons are not similarly introduced. It may also benefit to introduce here that ASER integrates memory with current salt gradients, tuning its output to produce context-appropriate behaviour.  

      Due to the limitation of the word number in the abstract, we could add no more the explanations. 

      "it can be anticipated that the ASER-AIY synaptic transmission will undergo a reversal due to alterations in the basal glutamate Release": Where is this expectation drawn from? Is it derived from biophysical or is it a functional expectation to explain the network's output constraints?  

      As delineated before this sentence, it is derived from a comprehensive consideration of the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons, in conjunction with varying the basal level of glutamate transmission from ASER.

      The statement that the model "revealed the modular neural circuit function downstream of ASE" could be more explicit. What specific insights about the downstream circuit were uncovered?

      Highlighting one or two key findings would strengthen the impact.  

      Due to the limitation of the word number in the abstract, no more details could be added here, while the sentence was revised as “revealed that the circuit downstream of ASE functions as a module that is responsible for salt klinotaxis.” This is because the salt-concentration dependent behaviors in klinitaxis can be reproduced through the modulation of the ASRE-AIY synaptic connections alone, despite the absence of alterations in the neural circuit downstream of AIY.

      I believe the authors should cite Luo et al. 2014, which also studies how chemotactic behaviours arise from neural circuit dynamics, including the dynamic encoding of salt concentration by ASER, and the crucial downstream interaction with AIY for chemotactic actions. 

      We would like to express our gratitude for useful suggestion. We cited Luo et al. 2014 in the discussion on the limitation of our work. 

      The introduction could also be improved for clarity. Specifically in the last paragraph authors should clarify how the observed synchrony of ASER excitation to the AIZ (Matsumoto et al., 2024), validates the resulting network.  

      We would like to express our gratitude for useful suggestion. We added the following explanation in the last paragraph of the introduction.

      “Specifically, the synchrony of the excitation of the ASER and AIZ {Matsumoto:2024ig} taken together with the experimentally identified inhibitory synaptic transmission between the AIY and AIZ revealed that the ASER-AIY synaptic connections should be inhibitory, which was consistent with the network obtained from the most evolved model.”

      In addition, we added the following explanation after “It was then hypothesized that the ASER-AIY inhibitory synaptic connections are altered to become excitatory due to a decrease in the baseline release of glutamate from the ASER when individuals are cultured under C<sub>cult</sub> < C<sub>test</sub>.”

      This is due to the substantial difference in the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons.

      I would also strongly recommend replacing the term "evolved model", with "Optimized Model" or "Best-Performing Model" to clarify this is a computational optimization process with limitations - optimization through GAs does not guarantee finding global optima.  

      We revised "evolved model" as "optimized model" in the main and SI text.

      The text overall would benefit from editing for clarity and expression.  

      According to the revisions mentioned above, we revised “best optimized model” as “most optimized model” in the main and SI text.

      The font size on the plot axis in Figures 3 c&d should be increased for readability on the printed page. Label the left/right panel to indicate unconstrained / constrained evolution.  

      As you noted, the font size of the subscript on the vertical axis in Figs 3c and 3d was too small. We have revised the font size of the subscript in Figs. 3c and 3d and also in Fig. 5e. At your suggestion, “unconstrained” and “constrained” have been added as labels to the left and right panels in Fig. 3.

      There is no input/transmission to AIYR to step input in either model shown in Figure 3? 

      As shown in Fig. S1e and S1f, there are the transmissions to the AIYR from the ASEL and ASER. 

      Supplementary Figure 1 attempts to explain the interactions. There are inconsistent symbols used for inhibition and excitation between network schema (colours) and the z response plots (arrows vs circles), combined with different meanings for red/blue making it very confusing. 

      We could not address the inconsistency in the color of arrows and lines with an ending between Figs. S1c and S1d and Figs. S1a and S1b. On the other hand, Figs. S1e and S1f were revised so that the consistent symbols were used for inhibition, excitation, and electrical gap connections in Figs. S1c-S1f. The same revisions were made for Fig. S7c-S7f.

      Model parameters are given to 15 decimal precision, which seems excessive. Is model performance sensitive to that order? We would expect robustness around those values. The authors should identify relevant orders and truncate parameters accordingly. 

      We examined the influence of the parameter truncation on the trajectory and decided that the parameters with four decimal places were appropriate. According to this, we revised Table A4.

      Figure 3 caption typo "step changes I the salt concentration".  

      The typo was revised in Fig. 3 caption. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Overall, the language of the paper is not properly organized, making the paper's logic and purpose hard to follow. In the Results Section, many observations or findings lack explicit interpretation. To address this issue, the authors should consider (1) adopting the contextcontent-conclusion scheme, (2) optimizing the logic flow by clearly identifying the context and goals prior to discussing their results and findings, (3) more explicitly interpreting their results, especially in a biological context.  

      We would like to express our gratitude for helpful suggestion. According to your suggestion listed below, we revised the main and SI texts.

      (2) In Figure 2, trajectories from the model with AIY-AIZ constraints show a faster convergence than those from the constraint-free model. However, in the corresponding texts in the Results section, the authors claimed no significant difference. It seems that the authors made this argument only based on CI (Chemotaxis Index). Therefore, in order to address such inconsistency, the authors need more explanation on why only relying on CI, which is an endpoint metric, instead of the whole navigation.  

      I would like to thank you for the helpful comment. In the present study, not only the CI but also the curving rate shown in Fig. 4 were applied to characterize the behavior in klinotaxis.

      According to your comments, we revised the related description in the main text as follows:

      “The difference between these CI values is slight, while the model optimized with the constraints exhibits a marginally accelerated attainment of the salt concentration peak, as shown by the trajectories. The slightly higher chemotaxis performance observed in the constrained model is not essentially attributed to the introduction of the AIY-AIZ synaptic constraints but rather depends on the specific individuals selected from the optimized individuals obtained from the evolutionary algorithm. In fact, even when the AIY-AIZ constraints are taken into consideration, the model retains a significant degree of freedom to reproduce salt klinotaxis due to the presence of a substantial parameter space. Consequently, the impact of the AIY-AIZ constraints on the optimization of the CI is expected to be negligible.”

      (3) In Figures 3a and b, some inter-neuron connections are relatively weak (e.g., AIYR to AIZR in Figure 3a) - thus it is unclear whether the polarity of such synapses would significantly influence the behavioral outcome or not. The authors could consider plotting the change of the connection strengths between neurons over the course of model optimization to get a sense of confidence in each inter-neuron connection. 

      In the evolutional algorithm, the parameters of individuals are subject to discontinuous variation due to the influence of selection, crossover, and mutations. Consequently, it is not straightforward to extract information regarding parameter optimization from parameter changes due to the non-systematic nature of parameter variation..

      (4) In Figure 3, the order of individual figure panels is incorrect: in the main text, Figure 3 a and b were mentioned after c and d. Also, the caption of Figure 3c "negative step changes I the" should be "in".  

      The main text underwent revision, with the description of Figures 3a and 3b being presented prior to that of Figures 3c and 3d. The typo was revised.

      (5) In Figure 4, the order of individual figure panels is messed up: in the main text, Figure 4 a was mentioned after b.  

      The main text underwent revision, with the description of Figure 4a being presented prior to that of Figure 4b.

      (6) Also in Figure 4, the authors need to provide a definition/explanation of "Bearing" and "Translational Gradient". In Figure 4d, the definition of positive and negative components is not clear.  

      Normal and Translational Salt Concentration Gradient in METHOD was referenced for the definition and explanation of the bearing and the translational gradient. We added the following explanation on the positive and negative components.

      “The positive and negative components of the curving rate are respectively sampled from the trajectory during leftward turns (as illustrated in Fig. 4b) and rightward turns, respectively.”

      (7) Figure 5: the authors need to explain why c has an error bar and how they were calculated, as this result is from a computational model. Figure 5d is experimental results - the authors need to add error bars to the data points and provide a sample size. 

      As explained in Analysis of the Salt Preference Behavior in Klinotaxis in METHOD, the ensemble average of these quantities was determined by performing 100,000 sets of the simulation with randomized initial orientation for a simulation time of T_sim=200 sec. The error bars for the experimental data were added in Figs. 5c, 6a, and S9a.

      (8) On Page 14, the authors said, "To this end, this end, we used the best evolved network with the constraints, in which we varied the synaptic connections between ASER and AIY from inhibitory to excitatory." How did the model change the ASER-AIY signaling specifically? The authors should provide more explanation or at least refer to the Methods Section.  

      The caption of Fig. S4 was referred as the explanation on the detailed method. 

      (9) Page 15: "a subset a subset exhibited a slight curve...". This observation from the model simulation is contradictory to experiments. However, their explanation of that is hard to understand.  

      I would like to thank you for the helpful comment. To improve this, we added the following explanation:

      “In the case of step increases in 𝑧OFF as illustrated in the second right panel from the bottom in Fig.3d, the turning angle φ is increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward lower salt concentrations. On the other hand, in the case of step increases in 𝑧ON as illustrated in the second left panel from the bottom in Fig.3d, the turning angle φ is again increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward higher salt concentrations. The behaviors that are consistent with these analyses are observed in the trajectory illustrated in Fig. S8b.”

      (10) Last result session: inhibited SMB in starved worms is due to a mechanism unrelated to their neural network model upstream to SMB. Therefore, their results recapitulating the worms' dispersal behaviors cannot strengthen the validity of their model.  

      We agree with your opinion. We think that the findings from the study of starved worms do not provide evidence to validate the neural network model upstream of SMB.   

      (11) Discussion: "in contrast, the remaining neurons...". This argument lacks evidence or references.  

      This argument is based on the results obtained from the present study. This sentence was revised as follows:

      “This regulatory process enables the reproduction of salt concentration memory-dependent reversal of preference behavior in klinotaxis, despite the remaining neurons further downstream of the ASER not undergoing alterations and simply functioning as a modular circuit to transmit the received signals to the motor systems. Consequently, the sensorimotor circuit allows a simple and efficient bidirectional regulation of salt preference behavior in klinotaxis.”

      (12) To increase the predictive power of their model, can the authors perform simulations on mutant worms, like those with altered glutamate basal level expression in ASER?  

      We would like to express our gratitude for useful suggestion. The simulations, in which the weight of the ASER-AIY synaptic connection is increased from negative (inhibitory connection) to positive (excitatory connection), as illustrated in Figure S4, provide valuable insights into the relationship between varying glutamate basal levels from ASER and behavior in klinotaxis, such as the chemotaxis index.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABA<sub>A</sub>R α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA<sup>-/-</sup> mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1<sup>-/-</sup> brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABA<sub>A</sub>R clustering phenotype in EndoA1<sup>-/-</sup> neurons by surface GABA<sub>A</sub>R γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For all of the electrophysiology experiments, only the number of neurons recorded is stated, but not the number of independent animals that these neurons were obtained from. The number of independent animals used should be stated for each panel. At least 3 independent animals should be used in each group, otherwise, more data needs to be added.

      We apologize for missing the information in the original manuscript. For all electrophysiological experiments, data were obtained from more than 3 experimental animals. The figure legends were updated to include the number of independent animals used for each panel.

      (2) For the cell culture experiments analyzing dendritic puncta at GABAergic synapses, the number of data points analysed appears to be the number of dendritic segments quantified, regardless of whether they originate from the same neuron or not. This analysis method is not valid, since dendritic segments from the same neuron cannot be counted as statistically independent samples. The authors need to average the values for all dendritic segments from one neuron, such that one neuron equals one data point. This alteration should be made for Figures 2B, 2D, 4H, 4J, 5B, 5C, 5E, 5J, 5L, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D. In addition, the number of independent cultures from which the neurons were obtained should be stated for each panel. At least 3 independent cultures should be used in each group, otherwise, more data need to be added.

      Thanks for the criticism. We reanalyzed the data throughout the manuscript as suggested and updated the figure legends accordingly. Moreover, we increased the number of neurons from independent experiments to further confirm the results in our revised manuscript.

      In the revised manuscript, we averaged the values for all dendritic segments from a single neuron and updated the data in Figure 3B, 3D, 4H, 4J, 5B, 5C, 5E, 5K, 5M, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D.

      Neurons analyzed in each group were derived from at least 3 independent cultures. Due to very low efficiency of sparse transfection in primary cultured hippocampal neurons, multiple experimental repetitions were necessary to obtain the sufficient number of neurons for analysis. We described statistical analysis in “Material and Methods” section in the original manuscript as follows:

      “For all biochemical, cell biological and electrophysiological recordings, at least three independent experiments were performed (independent cultures, transfections or different mice).”

      (3) Individual data points should be shown on all graphs, particularly in Figures 2C, 2F, 2I, 3F, 3K, and 3L.

      Thank you for the suggestion. We replaced the original graphs with scatterplots and mean ± S.E.M. in new Figures.

      (4) For each experiment, the authors should state explicitly in the methods section whether that experiment was conducted blind to genotype.

      Thank you for the suggestion. We have modified the description of blind analysis for each experiment in methods section to “Seizure susceptibility was measured blindly by rating seizures on a scale of 0 to 7 as follows…”, “Quantification of immunostaining were carried out blindly…” in our revised manuscript.

      (5) For each experiment, the authors should state whether they used male or female mice, and what age the mice were at the time of the experiment

      Thanks a lot for the suggestion. We usually use male and female mice for neuron culture and behavioral test. We observed no sex-related differences in PTZ-induced behaviors, so the results were pooled together.

      For mice ages, P0 pups were used for hippocampal neuron cultures and virus injection in electrophysiological recording assays or FingR probes assays. P14-21 mice were used for electrophysiological recording, immunofluorescent staining and FingR probes detection in brain slice, while adult mice (P60) for behavioral tests, immunofluorescent staining in brain slice and biochemical assays. We have modified the description in genders and ages of mice in methods section to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates were intraperitoneally administered… ”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “Hippocampi of female or male pups (P0) were rapidly dissected under sterile conditions…”, “PSD fractions from adult mouse brain were prepared as previously described…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…” in our revised manuscript.

      (6) For each experiment involving WT and KO mice, please state whether WTs and KOs were bred as littermates from heterozygous breeders

      Sorry for the confusion. In our study, EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders. We added the information in methods section as follows in our revised manuscript, “EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders…”, “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates…”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “For co-IP from brain lysates, the whole brain from 8-10-week-old WT and KO littermates were dissected…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…”.

      (7) For experiments comparing three or more groups, the authors claim in the methods section to have used a one-way ANOVA for statistical analysis. However, no ANOVA values are given, only the post-hoc tests. Please add the ANOVA values for each experiment before stating the values of the post-hoc analysis.

      Sorry for the missing information. We used one-way ANOVA for comparing three or more groups in the original manuscript and have changed to two-way ANOVA for behavior data analysis in our revised manuscript as suggested in Recommendations (18). We added the ANOVA values (F & p values) for each experiment in new figures. For example, see Figure 1C.

      (8) In Figure 1A-C, seizure susceptibility was compared in EEN+/+ and EEN-/- mice, but the methods section states that seizure susceptibility was evaluated in 8-10-week-old male C57BL/6N mice (line 513). Was this meant to indicate that the EEN+/+ and EEN-/- mice were on a C57BL/6N background? How does this match with the statement that EEN1 -/- mice were generated on a C57BL/6J background (line 467)?

      We apologize for the mistake. In our study, EEN1<sup>-/-</sup> mice were generated on a C57BL/6J background, as stated in our previously published papers (Yang et al., 2021; Yang et al., 2018) and in “Animals” in Material and Methods of our original manuscript. We had corrected the statement to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates…” in Material and Methods of the revised manuscript.

      (9) In the electrophysiology experiments in Figure 1E-O, it is not clear to me which neurons were recorded in the control group. The methods section states that "Whole-cell recordings were performed on an AAV-infected neuron and a neighboring uninfected neuron" (line 736). However, the figure legends states that recordings were obtained from "10 control (Ctrl, mCherry alone) and 10 EEN1 KO (mCherry and Cre) pyramidal neurons" (line 1079), which would indicate that the controls are not uninfected neurons from the same animal, but AAV-mCherry infected neurons from a different animal. Please clarify which of the two descriptions is accurate.

      Thanks for catching the error! In all electrophysiological experiments, a neighboring uninfected neuron was used as the control in Figure 1E-O. This was incorrectly stated in the figure legend of the original manuscript. In the revised manuscript, the information has been corrected in figure legends of new Figure 1 (E-F).

      (10) The authors show that in Endophilin A1 KO animals, eIPSCs are reduced, but mIPSC frequency and amplitude are unaltered. How do they explain this finding in the context of the fact that gephyrin and GABAAR1.

      We apologize for the confusion about the data of electrophysiological recording. Compared with eIPSC, which are recorded in the presence of electrically evoked action potential that elicited a substantial release of neurotransmitter, mIPSCs are small, spontaneous currents recorded in the presence of TTX during patch-clamp experiments, resulting from the release of neurotransmitters from presynaptic terminals in the absence of action potential. The amplitude of mIPSCs typically reflects the quantal release of neurotransmitters, while their frequency can vary depending on synaptic activity and the state of the neuron.

      A number of molecules fine-tune presynaptic neurotransmitter release and functions of inhibitory postsynaptic receptors. In our study, inhibitory postsynapses were partially affected in endophilin A1 knockout neurons, while presynaptic endophilin A1 remained intact during electrophysiological recordings. Conceivably, the observed deficits in endophilin A1 knockout mice were mild. Following endophilin A1 depletion, inhibitory postsynaptic receptors appeared sufficient to respond to spontaneous neurotransmitter release but may be inadequate to large amounts of neurotransmitter release evoked by action potential. Meanwhile, spontaneous synaptic activity and the state of the neuron were not obviously affected under basic state by endophilin A1 depletion during postnatal stages. Consequently, mIPSC frequency and amplitude remain unaltered but eIPSCs were reduced compared to the control neurons. This finding was consistent with behavioral experiments, where aggressive epileptic behaviors were induced by PTZ rather than spontaneous epilepsy in endophilin A1 knockout mice.

      (11) Distribution of gephyrin, VGAT, and GABAARg2 differs substantially between the different layers of hippocampal area CA1, and the same goes for the other regions of the hippocampus. However, in Figure 2, it is not clear to me from the sample images which layers of each subregion the authors quantified, or indeed whether they paid attention to which layers they included in their analysis. This can lead to a substantial skewing of the data if different layers were preferentially included in the two genotypes. Please clarify which layers were analysed, and how comparability between WTs and KOs was ensured. This is particularly important given the authors' claim that Endophilin A1 acts equally at all subtypes of GABAergic synapses (lines 373- 376).

      Thanks for the cautiousness! We distinguished each hippocampal subregion based on the anatomical structure in brain slices. Quantification of fluorescent mean intensity of each synaptic protein in all layers of each subregion, as shown in new Figure 2 and Figure S2A-F, revealed that GABAergic synaptic proteins were impaired in both P21 and P60 KO mice.

      We further analyzed the fluorescent signal of core postsynaptic component, gephyrin, in individual layers of each subregion in the hippocampus of mature WT and KO mice, as presented in new Figures S2G-H. Our findings demonstrated a decrease in gephyrin levels across all layers of each subregion in KO mice. Additionally, we examined gephyrin clustering across the soma, axon initial segment (AIS), and dendrites in cultured mature endophilin A1 knockout hippocampal neurons, as shown in new Figure S5E-H. The results showed that gephyrin was affected in all subcellular regions following endophilin A1 knockout.

      Collectively, these data suggest that endophilin A1 functions across all subtypes of GABAergic postsynapses.

      (12) In Figure 3E-F, the authors state that there was no change in the total level of synaptic neurons in EEN1 KO neurons (line 188). However, there is no quantification of the total level of synaptic neurons shown, and based on the immunoblot in Figure 3E, it looks like there is a substantial reduction in NR1, NL2, and g2. The authors should present a quantification of the total levels of these proteins and adjust their statement accordingly if necessary.

      Thanks a lot for your comments. We quantified the total protein levels in Figure 3E and added the result to new Figure 3F, showing that total protein levels were not obviously affected in cultured KO neurons. When normalized to total protein levels, the surface levels of GABA<sub>A</sub> receptors were significantly compromised compared to surface GluN1 and NL2. Furthermore, the total protein levels were not affected in brains of KO mice, as shown in Figures 3K (input) and 3L (S1). Collectively, there was no change in the total level of synaptic proteins in KO neurons.

      (13) In Figure 3G-I, the authors claim, based on super-resolution images as presented here, that Endophilin A1 colocalizes with gephyrin and g2. However, no quantification of this colocalization is presented. The authors should add this quantification to support their claim and indicate how many GABAergic synapses contain Endophilin A1.

      Thank you for the thoughtful comments. The resolution of the images is significantly improved by super-resolution microscopy. As a result, the overlap between the two proteins will become smaller or even disappear. Since no two proteins can occupy the same physical space, they would show lower colocalization and instead exhibit proximal localization. As expected, in Figures 3G and 3H, we observed only small overlap or proximal localization of endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2. To further confirm the localization of endophilin A1 in inhibitory synapses, we co-stained endophilin A1 with both pre- and post-synaptic proteins, gephyrin and Bassoon. Then we quantified the colocalization of endophilin A1 with gephyrin or with Bassoon using the method for super-resolution images described in the reference (Andrew D. McCall. Colocalization by cross-correlation, a new method of colocalization suited for super-resolution microscopy. McCall BMC Bioinformatics (2024) 25:55). The percentage of gephyrin or Bassoon puncta that were in close proximity with endophilin A1 was also calculated, as shown in new video 5 and new Figure S4B-G. These data have been added in the revised manuscript as follows, “We further detected the localization of endophilin A1 to inhibitory synapses by co-immunostaining with both pre- and post-synaptic markers (Figure. S4B and Video 5). Quantitative analysis of super-resolution localization maps revealed that ~ 47 % puncta of gephyrin or Bassoon were proximal to endophilin A1 (Figure. S4G, n \= 14), with a mean distance between endophilin A1- and gephyrin-positive pixels of ∼ 120 nm, or between endophilin A1- and Bassoon-positive pixels of ∼ 130 nm (Figure. S4C-F).”

      (14) In the quantification shown in Figure 3K-L, there are no error bars in the WT data sets. This presumably means that all values were normalized to WT. However, since this artificially eliminates the variance in the WT group, a t-test is no longer valid, since this assumes a normal distribution and normal variance, which are no longer given. The authors should either change the way they normalize their data to maintain the variance in the WT group or perform a different statistical test that can account for the artificial lack of variance in one of the groups.

      Thank you for the suggestions! We modified our analysis approach. Specifically, we used mean value of WTs to normalize data to preserve the variance in the WT group and performed unpaired t-tests to assess statistical significance in Figure 3K-L. Additionally, we replaced the bar graphs with modified graphs showing individual data points. Please see Response to Recommendation (12).

      (15) What is the difference between the coIP experiment in Figure 4E and 3J, right panel? In both cases, an Endophilin A1 IP is performed, and gephyrin, GABAARg2, and GABAARa1 are assessed. However, Figure 3J's right panel indicates that Endophilin A1 does interact with the GABAAR subunits, whereas Figure 4E shows that it does not. How do the authors explain this discrepancy? Were these experiments performed more than once?

      Sorry for the confusion. Figure 3J and Figure 4E show data from immunoisolation assay and conventional co-immunoprecipitation (co-IP), respectively. Immunoisolation allows for the rapid and efficient separation of subcellular membrane compartments using antibodies conjugated to magnetic beads. In Figure 3J, we used antibodies against GABA<sub>A</sub>R α1 subunit or endophilin A1 to isolate the inhibitory postsynaptic membranes or endophilin A1-associated membranous compartments. In contrast, co-immunoprecipitation detects direct protein-protein interactions in detergent-solubilized lysates. For Figure 4E, we applied antibodies against endophilin A1 to precipitate its interaction partners. The results in Figure 3J and Figure 4E demonstrate that endophilin A1 is localized in the inhibitory postsynaptic compartment and directly interacts with gephyrin, but not with GABA<sub>A</sub>Rs. Detailed information regarding the methods used for co-IP and immunoisolation can be found in “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Material and Methods” section of original manuscript.

      These experiments were repeated multiple times to ensure reliability. In fact, consistent data showing endophilin A1 localization in the inhibitory postsynaptic compartment were observed in Figure 3K, showing the quantified data as well.

      (16) For the colocalization analysis in Figure 5A-C, what percentage of gephyrin puncta contain g2 in the WT and Endophilin A1 KO? Currently, only a correlation coefficient is provided, but not the degree of overlap. Please add this information to the figure.

      Thanks for the comments on the colocalization analysis. We analyzed the percentage of gephyrin puncta overlapping with GABA<sub>A</sub>R γ2 and added the graphs in new Figure 5C.

      (17) Figure 6 investigates how actin depolarization affects GABAergic synapse function, but does not assess how Endophilin A1 contributes to this process. The authors then provide an extremely short statement in the discussion, stating that their data are contradictory to a previous study (lines 412 - 417). This section of the discussion should be expanded to address the specific role of Endophilin A1 in the consequences of actin depolymerization.

      Thanks a lot for the advice. In the original manuscript, we discussed the specific role of endophilin A1 at inhibitory postsynapses as follows in Discussion:

      “As membrane-binding and actin polymerization-promoting activities of endophilin A1 are both required for its function in enhancing iPSD formation and g2–containing GABA<sub>A</sub>R clustering to iPSD, we propose that membrane-bound endophilin A1 promotes postsynaptic assembly by coordinating the plasma membrane tethering of the postsynaptic protein complex and its stabilization with the actin cytomatrix”

      Following your advice, we added a statement in the revised manuscript addressing the role of endophilin A1 in actin polymerization at inhibitory postsynapses, shown as follows, “In the present study, the impaired clustering of gephyrin and GABA<sub>A</sub> γ2 by F-actin depolymerization underscores the essential role of F-actin in the assembly and stabilization of the inhibitory postsynaptic machinery. Membrane-bound endophilin A1 promotes F-actin polymerization beneath the plasma membrane through its interaction with p140Cap, an F-actin regulatory protein, thereby facilitating and/or stabilizing the clustering of gephyrin and γ2-containing GABA<sub>A</sub> ​receptors at postsynapses.”

      (18) Which statistical analysis was conducted in Figure 7F? Given the nature of the data, a repeated measures ANOVA would be necessary to accurately assess the statistical accuracy.

      Sorry for the confusion. We conducted one-way ANOVA followed by Tukey post hoc test at each time point in original Figure 7F. We have employed the method of repeated measures ANOVA followed by Tukey post hoc test as suggested in new Figure 7F. Meanwhile, we reanalyzed data in new Figure 1C with the same method. We also modified the description in “Statistical analysis” and Figure legends for new Figure1C and 7F in revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Data presentation:

      (1) Figures 2A, B, D, E, G, H. Figures S2A, B, D:

      Add P21 or P60 labels to these figures so that the difference between similarly stained samples (e.g. Figures 2A, B) is obvious to the reader.

      Thanks! We added “P21” or “P60” labels in new Figure 2 and Figure S2 as suggested.

      (2) Figures 4C, D:

      The authors must make their coIP data annotation consistent. In Figure 4C, they use actual microgram amounts when, e.g., describing how much input was present, yet in Figure 4D they use + and -. The authors should pick one.

      Thanks for the comments. We labeled the consistent data annotation in new Figure 4C and 4D, we also changed the label in 4F for the consistent data annotation.

      (3) Figure 5A

      GFP is gray in this figure, but in all other figures, it is blue. Consider changing for presentation reasons.

      Thanks a lot for pointing out the problem. We replaced gray with blue color to indicate GFP in new Figure 5A.

      (4) Figures 6A, C, E, G

      Label graphs as either short-term or long-term drug treatment.

      Thanks for the suggestion. We labeled the graphs as 60 min for short-term or 120 min for long-term drug treatment in new Figure 6A, C, E, G for convenient reading.

      Annotation, grammar, spelling, typing errors:

      (1) Figure 4G:

      Merge and GFP labels are seemingly swapped.

      Thanks a lot for sharp eye. We corrected the labels in new Figure 4G.

      (2) Fig 4I:

      The authors use "Gephryin" instead of GPN. They should be consistent and choose one.

      Sorry for the mistake. We changed the label consistent with other figures in new Figure 4I and rearranged the images in figures for good looking.

      (3) "One-hour or two-hour treatment of mature neurons with nocodazole..."

      Thanks for your advice. We modified the sentence to “Treatment of mature neurons with nocodazole, a microtubule depolymerizing reagent, for one hour (short-term) or two hours (long-term), caused…”.

      (4) The authors should indicate that one-hour is their short-term treatment and that two-hour is their long-term treatment so that when these terms are used later to describe LatA experiments, it is clearer to the reader.

      Thanks for your comments. We modified the statement as seen in Response to Recommendation (3), it is clearer to the reader.

      (5) EEA1. The authors should use a more conventional term EndoA1 so that the manuscript can be searched easily.

      Thanks a lot for the suggestion. We replaced all of the term “EEN1” with “EndoA1” in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Major Points

      (1) The number of observations for the electrophysiology experiments in Figure 1 (dots are neurons) is very low and it is not clear whether the data shown is derived from different mice. The same criticism applies to the data shown in Figures 7G-K.

      We apologize for the low neuron number in electrophysiology experiments. In the patch-clamp experiments, the number of neurons recorded was higher than what is shown in the figures. However, neurons with a membrane resistance (Rm) below 500 MΩ, indicating unstable seals or poor conditions, were excluded from the analysis. Additionally, we added the number of mice from which the data derived in each group in the figure legends for Figure 1, 7 and S1, this point was also raised by Reviewer #1 (Please see Response to Recommendation (1)).

      (2) Images in Figure 2 are shown at low magnification, statements on changes in intensity of inhibitory synaptic markers in the hippocampal region are impossible to interpret. Analysis of inhibitory synapses in vivo would require sparse neuronal labeling and 3D reconstruction, for instance using gephyrin-FingRs (Gross et al., Neuron 2013).

      Thanks for your insightful suggestion. We obtained pCAG_PSD95.FingR-eGFP-CCR5TC and pCAG_GPN.FingR-eGFP-CCR5TC constructs from Addgene (plasmid # 46295 & #46296). We attempted in utero electroporation (IUE) to introduce the DNAs into cortical neurons or hippocampal neurons at E14.5, unfortunately with no success. Following the repetitive operation for numerous times, we could eventually obtain newborn pups of ICR mice after IUE. However, we failed to obtain any newborn pups of C57BL/6J mice due to abortion following the procedure. Furthermore, pregnant C57BL/6J mice (WTs or KOs) did not survive or remained in a poor state of health after surgery. Therefore, we were unable to analyze synapses through sparse labeling and 3D reconstruction by IUE. Alternatively, we obtained commercial AAVs carrying rAAV-EF1a-PSD95.FingR-eGFP-CCR5TC and rAAV-EF1a-mRuby2-Gephyrin.FingR-IL2RGTC, then injected into the CA1 region of EndoA1<sup>fl/fl</sup> mice at P0. Mice were fixed and detected the fluorescent signals in CA1 regions at P21. Consistent with immunostaining with antibodies, decreased mRuby2-Gephyrin.FingR or PSD95.FingR-eGFP was observed in dendrites of KO neurons at P21, as shown in new Figure S3. In combination with electrophysiological recording, PSD fractionation and immunoisolation from brains, these data support our conclusion regarding the effects of endophilin A1 knockout on the inhibitory synapses.

      Additionally, we transfected DIV12 cultured hippocampal neurons with pCAG_PSD95.FingR-eGFP-CCR5TC or pCAG_GPN.FingR-eGFP-CCR5TC and observed fluorescent signals on DIV16. Both the signal intensity and number of GPN.FingR-eGFP clusters were also significantly attenuated, with no obvious changes in PSD95.FingR-eGFP clusters in dendrites of mature neurons, as shown in new Figure S5A-D. We are very pleased that the result further strengthened our original conclusion. We have added the new pieces of data in our revised manuscript.

      (3) Figure 3: surface labeling of GluA1 or the GABAAR gamma 2 subunit is difficult to interpret: the patterns are noisy and the numerous puncta appear largely non-synaptic although this is difficult to judge in the absence of additional synaptic markers. It appears statistics are done on dendritic segments rather than the number of neurons. The legend does not mention how many independent cultures this data is derived from. In their previous study (Yang et al., Front Mol Neurosci 2018), the authors noted a decrease in surface GluA1 levels in the absence of endophilin A1. How do they explain the absence of an effect on surface GluA1 levels in the current study?

      Sorry for the concern and thanks for your comments. First, we assessed changes in the surface levels of excitatory and inhibitory receptors by co-immunostaining in cultured WT and KO hippocampal neurons. Given the very low transfection efficiency of neurons in high density culture, numerous puncta of receptors from adjacent non-transfected neurons were also detected. This approach may contribute to the noisy pattern observed in Figure 3A. Besides, the projections of z-stack for higher magnified dendrites may likely introduced higher background signals. We have now replaced the original images with the newest repeat in new Figure 3A. Moreover, we confirmed a decrease in the surface expression of GABA<sub>A</sub>R γ2 by the biotinylation assay, as shown in Figure 3E. Indeed, we agree that some puncta for surface labeling of receptors seemed to be non-synaptic localization. In order to reflect the decrease in synaptic proteins at synapses, we isolated PSD fraction by biochemical assay and found that gephyrin and GABA<sub>A</sub>R γ2, two major inhibitory postsynaptic components, were reduced in the PSD fraction from KO brains, as shown in Figure 3L. Their colocalization was also attenuated in the absence of endophilin A1, as shown in Figure 5A-C. Combined with electrophysiological recording, these data from multiple assays indicate GluA1 at synapses was not obviously affected but GABA<sub>A</sub>R γ2 at synapses was impaired in endophilin A1 KO neurons in the present study.

      We have corrected the way that the number of samples is defined for statistical analysis as suggested. This point was also raised by Reviewer #1 (Recommendation (2)). We averaged the values from all dendritic segments of a single neuron, such that one neuron equaled one data point. We had replaced the original Figure 3B and 3D (please see Response to Recommendation (2) by Reviewer #1). Additionally, we added the number of independent cultures these data were derived from to figure legends in revised manuscript.

      Previously, we observed a small decrease in surface GluA1 levels in spines under basal conditions and a more pronounced suppression of surface GluA1 accumulation in spines upon chemical LTP in endophilin A1 KO neurons from EndoA1<sup>-/-</sup> mice that knockout endophilin A1 since embryonic development stages (Figure 5C,H. Yang et al., Front Mol Neurosci, 2018). In Figure 3A and B in current study, we analyzed surface receptor levels in GFP-positive dendrites, rather than spines, under basal conditions when endophilin A1 was depleted at the later developmental stage. We found a decrease in surface GABA<sub>A</sub>R γ2 levels but no significant effects on surface GluA1 levels in dendrites. These findings indicate that endophilin A1 primarily affects excitatory synaptic proteins in spines during synaptic plasticity and inhibitory synaptic proteins in dendrites under basal conditions in mature neurons.

      (4) Super-resolution images in Figure 3G, H, I: endophilin A1 puncta look different in panel 3I compared to 3G and 3H, which are very noisy. It is difficult to interpret how specific these EEN1 puncta are. Previous images showing EEN1 distribution in dendrites look different (Yang et al., Front Mol Neurosci 2018); is the same KO-verified antibody being used here? Colocalization of EEN1 with gephyrin or the GABAAR gamma 2 subunit is difficult to interpret; gephyrin mostly does not seem to colocalize with EEN1 in the example shown.

      Sorry for your concerns. As stated previously in Major Points (3), transfection efficiency was very low in cultured neurons and our cultured neurons were at relative high density. As a result, numerous puncta of proteins located in the adjacent non-transfected neurons were also detected, which may contribute to noisy signals observed in Figure 3G-I.

      In our previous paper, we confirmed the specificity of the antibody against endophilin A1 (5A,B. Yang et al., Front Mol Neurosci, 2018). We used the same antibody (rabbit anti-endophilin A1, Synaptic Systems GmbH, Germany) in the current study. While the previous images were obtained using confocal microscopy, the current images in Figures 3G, H, and I were acquired using super-resolution microscopy (SIM). The different patterns observed in the dendrites may be attributed to the difference in image resolution, antibodies dilution and reaction time.

      Reviewer #1 also points out the quantification of colocalization of gephyrin and GABA<sub>A</sub>R γ2 with endophilin A1. Please see Response to Recommendation (13) by Reviewer #1.

      (5) The interaction of gephyrin and endophilin A1 is based on coIP experiments in cells and brain tissue. To convincingly demonstrate that these proteins interact, biophysical experiments with purified proteins are necessary.

      Thanks a lot for your great suggestions on the interaction of endophilin A1 with gephyrin. To convincingly demonstrate their interaction, we performed pull-down assay with purified recombinant proteins and the result shows that both G and E domains of gephyrin were involved in the interaction with endophilin A1. The data has been added to the revised manuscript as new Figure 5I. We also modified the statement about the data and figure legends in the revised manuscript.

      (6) Figure 4G: the gephyrin images are not convincing; the inhibitory postsynaptic element typically looks somewhat elongated; these puncta are very noisy and do not appear to represent iPSDs. The same criticism applies to the images shown in Figures 5 and 7.

      Thanks for the comment. The gephyrin puncta in our images exhibited heterogeneous shapes and sizes, with some appearing somewhat elongated. To address this, we compared the puncta pattern of gephyrin with that shown in the reference. As illustrated in the figure from the reference, gephyrin puncta also displayed distinct shapes and sizes, Figure 3A-F, Neuron 78, 971–985, June 19, 2013). Please note that the images were z-stack projections at higher magnification, as described in the "Materials and Methods" section. This approach may likely introduce higher background signals and may contribute to the much more heterogeneous appearance of the puncta in Figures 4, 5, and 7. As mentioned previously, the numerous gephyrin puncta located in the adjacent non-transfected neurons may also contribute to some of the noisy signals observed. We have replaced the original images with new images in new Figure 4G, 5 and 7.

      Moreover, in order to confirm the effects of endophilin A1 KO on the gephyrin clustering, we also detected the endogenous clusters of gephyrin or PSD95 visualized by GPN.FingR-eGFP or PSD95.FingR-eGFP in cultured mature neurons. The results were consistent with immunostaining with antibodies against gephyrin. Please see Response to Recommendation (2)

      (7) Figure 7E, F: the rescue (Cre + WT) appears to perform better than the control (mCherry + GFP) in the PTZ condition; how do the authors explain this? Mixes of viral vectors were injected, would this approach achieve full rescue?

      Thanks for the thoughtful comment. Mixed viruses were injected bilaterally into the hippocampal CA1 regions. The results showed a full rescue effect by WT endophilin A1 in knockout mice during the early days, with even a little bit better rescue effect than the control group in the later days under the PTZ condition, as shown in Figures 7E and 7F. In the current study, overexpression of endophilin A1 increased the clustering of gephyrin and GABA<sub>A</sub>R γ2 in cultured neurons, as shown in Figures 4I-J and 5D-E. Presumably, the slightly better rescue effects observed in the behavioral tests was likely attributed to the enhanced clustering and/or stabilization of gephyrin/GABA<sub>A</sub>R γ2 by WT endophilin A1 expression in KO neurons in vivo. Moreover, the electrophysiological recording also showed full rescue effects on eIPSC by WT endophilin A1 in KO neurons (Figure 7G-K).

      Minor Points

      (1) The authors mention that they previously found a decrease in eEPSC amplitude in EEN1 KO mice (Yang et al., Front Mol Neurosci 2018). The data in Fig. 1E suggests a decrease in eEPSC amplitude but is not significant here, likely due to the small number of observations. If both eEPSC and iEPSC amplitude are reduced in the absence of EEN1. Would the E/I ratio still be significantly changed?

      We apologize for the confusion. In our previous study, AMPAR-mediated excitatory postsynaptic currents (eEPSCs) were found to be slightly but significantly reduced compared to the control group, while NMDAR-mediated excitatory postsynaptic currents showed no significant difference (Figure 4N,O. Yang et al., Front Mol Neurosci, 2018). In the current study, we adopted a different recording protocol, simultaneously measuring eEPSCs and eIPSCs from the same neuron to calculate the E/I ratio. Unlike previous studies, we did not use inhibitors to suppress GABA receptor activity. As a result, the recorded signals did not distinguish AMPAR-mediated or NMDAR-mediated excitatory postsynaptic currents to reflect total eEPSCs, which may explain the non-significant reduction observed compared to control neurons in this study.

      It is possible that the eEPSC amplitude would show a significant reduction if a larger number of neurons were recorded. Nevertheless, the larger suppression of eIPSCs in the absence of endophilin A1 indicates that the E/I ratio is significantly altered.

      (2) Page 7: the authors mention they aim to exclude effects on presynaptic terminals of deleting endophilin A1 in cultured neurons, is this because of a sparse transfection approach?

      Please clarify.

      Sorry for the confusion. In cultured neurons, we always observed sparse transfection due to the very low transfection efficiency (~ 0.5%). Therefore, we could examine the effects of endophilin A1 knockout specifically in the specific CamKIIa promoter-driven Cre-expressing postsynaptic neurons, while endophilin A1 remained intact in the non-transfected presynaptic neurons.

      (3) The representative blot of the surface biotinylation experiment (Figure 3E) suggests that loss of endophilin A1 also affects GluN1 and Nlgn2 levels, and error bars in panel 3F (lacking individual data points) suggest these experiments were highly variable.

      Sorry for the confusion. Reviewer #1 also raised the question and we quantified the total level of GluN1 and NL2 in Figure 3E. And we replaced the original graphs with scatterplots and means ± S.E.M. Please see the Response to Recommendation (3) & (12) by Reviewer #1.

      (4) Have other studies analyzing inhibitory synapse composition identified endophilin A1 as a component? The rationale for this study seems to be primarily based on the presence of epileptic seizures and E/I imbalance.

      Thank you for your questions. To date, no other studies investigated endophilin A1 as an inhibitory postsynaptic component. We observed the proximal localization of endophilin A1 with inhibitory postsynaptic proteins using super-resolution microscopy (SIM) and quantification results showed ~ 47% puncta of gephyrin correlated with endophilin A1 (Figure 3G-I and S4B-G). We further immunoisolated the inhibitory postsynaptic fraction using GABA<sub>A</sub> receptors and found that endophilin A1 was present in the isolated fraction, and vice versa (Figure 3J). Additionally, we demonstrated that endophilin A1 directly interacted with gephyrin through co-IP and pull-down assays (Figure 5J-I). Together with data from immunolabeling, biochemical assays, electrophysiological recordings, and behavioral tests, these results identified endophilin A1 as an inhibitory postsynaptic component.

      (5) Figure 3J: what are S100 and P100 labels? Is Nlgn2 part of the EEN1 complex? If it is, why are Nlgn2 surface levels not affected by EEN1 loss (Figure 3E, F, K)? Why does EEN1 not interact with Nlgn2 in HEK cells (Figure 4D)?

      Sorry for the confusion. The detailed information regarding S100 and P100 can be found in the “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Materials and Methods” section. S100 contains soluble proteins, while P100 refers to the membrane fraction after high speed (100,000xg) centrifugation.

      Figures 3J-K and 4C-F showed the data from immunoisolation and conventional co-immunoprecipitation assays, respectively. Immunoisolation, which uses antibodies coupled to magnetic beads, allows for the rapid and efficient separation of subcellular membrane compartments. In Figure 3J-K, we used antibodies against GABA<sub>A</sub>R α1 to isolate membrane protein complexes from the inhibitory postsynaptic fraction. In contrast, co-immunoprecipitation typically detects direct interactions between proteins solubilized by detergent treatment. For Figure 4C-F, FLAG beads were used in HEK293 lysates, or antibodies against endophilin A1 were employed in brain lysates to precipitate direct interaction partners. Combined with the results from Figure 3J-L, the data in 4C-F indicated that endophilin A1 was localized in the inhibitory postsynaptic compartment and directly bound to gephyrin but not to either GABA<sub>A</sub> receptors or Nlgn2 (NL2). This binding promoted the clustering of gephyrin and GABA<sub>A</sub>R γ2 at synapses, facilitating GABA<sub>A</sub>R assembly.

      Nlgn2 (NL2) is a key inhibitory postsynaptic component but does not directly bind to endophilin A1. Consequently, endophilin A1 failed to co-immunoprecipitate with NL2 in the presence of detergent in HEK293 cell lysates (Figure 4D). Furthermore, the surface levels of NL2 or its distribution in PSD fraction were unaffected by the loss of endophilin A1 (Figure 3E, F, K, L). This suggests that mechanisms independent of endophilin A1 orchestrate the surface expression and synaptic distribution of NL2.

      (6) How do the authors interpret the finding that endophilin A1, but not A2 or A3, binds gephyrin? What could explain these differences?

      Thanks for the thoughtful comment. Endophilin As contain BAR and SH3 domains. While the amino acid sequences in the BAR and SH3 domains are highly conserved, the intrinsically disordered loop region between BAR and SH3 domains is highly variable. A study by the Verstreken lab revealed that a human mutation in the unstructured loop region of endophilin A1 increases the risk of Parkinson's disease. They also demonstrated that the disordered loop region controls protein flexibility, which fine-tunes protein-protein and protein-membrane interactions critical for endophilin A1 function (Bademosi et al., Neuron 111, 1402–1422, May 3, 2023). Our previous study showed that endophilin A1 and A3, but not A2, bind to p140Cap through their SH3 domains, despite the high sequence homology in the SH3 domains among these proteins (Figure2A,B. Yang et al., Cell Research, 2015). These findings indicate that each endophilin A likely interacts with specific partners due to distinct key amino acids.

      Additionally, endophilin A1 is expressed at much higher levels than A2 and A3 in neurons, with distinct distribution of them across different brain regions. Our lab demonstrated that the function of A1 at postsynapses (both excitatory and inhibitory synapses) cannot be compensated by A2 or A3. Therefore, it is reasonable that endophilin A1, rather than A2 or A3, binds to gephyrin, even though the underlying mechanisms remain unclear.

      (7) Figure 4G: panels are mislabeled (GFP vs merge).

      Thanks for careful reading and sorry for the mistake. We corrected the label in new Figure 4G. Please see Response to Annotation, grammar, spelling, typing errors:(1) by Reviewer #2.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      The reviewer points out some excellent caveats regarding the morphant experiments. We agree that at least some of the effects of the podxl morpholino may be related to its effects on kidney development and/or gross developmental defects that impede liver development. Because of these limitations, we focused our experiments on analysis of CRISPant and mutant phenotypes, including showing that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effects on HSC number when injected with sgRNA#1. We did not observe any gross morphologic defects in podxl CRISPants. Liver size was not significantly altered in podxl CRISPants (Figure 2A). We will add brightfield images of podxl CRISPant larvae to the supplemental data for the revised manuscript.

      We agree with the reviewer that HRMA is not quantitative with respect to the fraction of alleles with indels and that capillary electrophoresis likely underestimates mutagenesis efficiency. Nonetheless, even with 100% mutation efficiency, podxl CRISPant knockdown, like most CRISPR knockdowns, would not represent complete loss of function:  ~1/3 of alleles will contain in-frame mutations and likely retain at least some gene function, so ~1/3*1/3 = 1/9 of cells will have no out-of-frame indels and contain two copies of at least partially functional podxl and ~2/3*2/3 = 4/9 of cells will have one out-of-frame indel and one copy of at least partially functional podxl. Thus, the decreased HSCs we observe with podxl CRISPant likely represents a partial loss-of-function phenotype in any case.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

      The reviewer proposes elegant experiments to address the specificity of the morpholino. For the revision, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s first concern. Please see our response above. In general, we agree that correlating phenotype penetrance with level of loss-of-function is a very good way to support conclusions regarding specificity in knockdown experiments. Unfortunately, because the phenotype we are examining (HSC number) has a relatively large standard deviation even in control/wildtype larvae (for example, 63 ± 19 (mean ± standard deviation) HSCs per liver in uninjected control siblings in Figure 1) it would be technically very difficult to do this experiment for podxl.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s second concern. Please see our response above. We acknowledge that some of the effects of the podxl morpholino may be non-specific. To address this concern in the revised manuscript, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      We appreciate the reviewer’s acknowledgement of the controls we performed to demonstrate the specificity of the CRISPant phenotypes. The proposed experiments (rescue, assessment of Podxl levels) would help bolster our conclusions but are technically difficult due to the relatively large standard deviation for the HSC number phenotype even in wildtype larvae and the lack of well-characterized zebrafish antibodies against Podxl.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      Thank you for this suggestion. We were referring in these sections to genes that are near the podxl locus with respect to three-dimensional chromatin structure; such genes would not necessarily be near the podxl locus on chromosome 4. We will clarify the text in this paragraph for the revised manuscript. At the same time, we will examine our transcriptomic data to check expression of mkln1, cyb5r3, and other nearby genes on chromosome 4 as suggested and include this analysis in the revised manuscript.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

      Thank you for this suggestion. We will modify these figures to clarify our results.

      Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Thank you very much for appreciating the hard work that went into this manuscript.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

      This reviewer makes an excellent point. Our finding that the largest changes in gene expression were in extracellular matrix (ECM) genes and ECM modulation is a major function of HSCs supports the hypothesis that genetic compensation is occurring in adults. Nonetheless, we agree that compensatory changes in adults may not fully reflect the compensatory changes during development, so it would bolster the conclusions of the paper to perform the RNA sequencing and qPCR experiments on zebrafish larval livers.

      We tried very hard to do this experiment proposed by Reviewer #3. In our hands, obtaining sufficient high-quality RNA for robust gene expression analysis typically requires pooling of ~10-15 larval livers. These larvae need to be obtained from a heterozygous in-cross in order to have matched wildtype sibling controls. Livers must be dissected from freshly euthanized (not fixed) zebrafish. Thus, this experiment requires genotyping live, individual larvae from a small amount of tissue (without sacrificing the larvae) before dissecting and pooling the livers. Unfortunately we were unable to confidently and reproducibly genotype individual live podxl larvae with these small amounts of tissue despite trying multiple approaches. Therefore we were not able to perform gene expression analysis on podxl mutant larval livers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decision-making when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented.

      (6) Some figures are not clear (see Figure 4 A & B).

      We will be trying to improve the quality of this image in the next version of the manuscript.

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

      The conflict-of-interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.

      Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      Weaknesses:

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed.

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors. We will be including a more explicit discussion of the limitations of SSF in urban environmental settings with human participants in the next version of the manuscript.

  2. Jun 2025
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      No major weaknesses noted.

      We gratefully appreciate your positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have mainly two questions for this work.

      Main point-1:

      The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where were these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, previously, more details had been added in Materials and methods section in the revised manuscript (see Line 482-493) (Manuscript with marked changes are related to “Related Manuscript File” in submission system). We gratefully appreciate your professional comments.

      Line 482-493: “Lactic acid bacteria (LAB) and Enterococcus strains were isolated from 39 samples: 33 fermented yoghurts samples (collected from families in multiple cities of China, including Lanzhou, Urumqi, Guangzhou, Shenzhen, Shanghai, Hohhot, Nanjing, Yangling, Dali, Zhengzhou, Shangqiu, Harbin, Kunming, Puer), and 6 healthy piglet rectal content samples without pathogen infection and diarrhea in pig farm of Zhejiang province (Table 1). Ten isolates were randomly selected from each sample. De Man-Rogosa-Sharpe (MRS) with 2.0% CaCO<sub>3</sub> (is a selective culture medium to favor the luxuriant cultivation of Lactobacilli) and Brain heart infusion (BHI) broths (Huankai Microbial, Guangzhou, China) were used for bacteria isolation and cultivation. Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) method was employed to identify of bacterial species with a confidence level ≥ 90% (He et al., 2022).”

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-133). We gratefully appreciate your professional comments.

      Line 129-133: “After identified by MALDI-TOF MS, a total of 290 bacterial isolates were isolated and identified from 33 fermented yoghurts samples and 6 healthy piglet rectal content samples. Those isolates consist of 63 Streptococcus isolates, 158 Lactobacillus/Lacticaseibacillus/Limosilactobacillus isolates, and 69 Enterococcus isolates (Figure 1A, Table 1).”

      Main-point-2:

      As probiotics, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strain such as 53103.

      I am sure the authors are also interested to know if P118 is better as a probiotics candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works. Nonetheless, the door open for future research had been left in Conclusion section (see Line 477-479) “Further investigations are needed to assess whether the mechanisms observed in P118 are strain-specific or broadly applicable to other L. rhamnosus strains, or LAB species in general.”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      This reviewer appreciates the efforts from the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way which is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-133, Line 140-143, Line 325-328, Line 482-493, Line 501-502, Line 663-667, Line 709-710, Line 1003-1143). We gratefully appreciate your valuable suggestions.

      For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or cite other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 501-710).

      Another example: the figures have great resolution, but they are way too busy. The figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy for us to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1003-1024, Line 1056-1075). We gratefully appreciate your professional comments and valuable suggestions.

      More minor comments:

      line 30: spell out "C." please.

      Done as requested (see Line 29, Line 31). We gratefully appreciate your valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]. It turns out that the Clew phage are highly related, which is highlighted by the genomic comparison in the supplementary figure S1. It therefore made sense to focus our in-depth analysis on one of the phage. We have included a supplementary figure (S1A), demonstrating that the other Clew phage also require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This is now mentioned in the discussion.  

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      First off, I would like to congratulate the authors on this study and manuscript. It is very well executed and the writing and flow of the paper are excellent. The findings are intriguing and I believe the paper will be very well received by both the phage, Pseudomonas, and biofilm communities. 

      Thank you for your kind review of our work!

      I have very little to critique about the paper but I have listed a few suggestions that I believe could strengthen the paper if corrected: 

      Comments and suggestions: 

      (1) The paper initially describes 4 isolated phages but no rationale is given for why they chose to continue with CLEW-1, as opposed to CLEW-3, -6, and -10. The paper would benefit from going into more detail with phage genomics and perhaps characterize the phage receptor binding to PSL. 

      Clew-1, -3, -6, and -10 are actually quite similar to one another. The genomes are now uploaded to Genbank [accession# PQ790658.1, PQ790659.1, PQ790660.1, and PQ790661.1]. They all require an intact Psl locus for infection, we have updated Fig. S1 to show this for the remaining Clew phage. In the end, it made sense to focus on one of these related phage and characterize it in depth.

      (2) PA14 was used in some experiments but not listed in the strain table. 

      Thank you, this has been added in the resubmission.

      (3) Would have been good to see more strains/isolates used.

      We are currently characterizing the host range of Clew-1. It appears to be pretty limited, but this will likely be included in another paper that will focus on host range, not only of Clew-1, but other biofilm-tropic phage that we have isolated since then.

      (4) Could purified PSL be added to make non-PSL strain (like PA14) susceptible? 

      We have tried adding purified Psl to a psl mutant strain, but this does not result phage sensitivity. Further characterization of the Psl receptor, is something we are currently working on, but will likely be a much bigger story than can be easily accommodated in a revised manuscript.

      (5) No data on resistance development. 

      We have not done this as yet.

      (6) Alternative biofilm models. Both in vitro and in vivo. 

      We agree that exploring the interaction of Clew-1 with biofilms in greater detail is a logical next step. The revised manuscript does have data on the viability of P. aeruginosa biofilm bacteria after Clew-1 infection using either a bead biofilm model or LIVE/DEAD staining of static biofilms. However, expanding on this further (setting up flow-cell biofilms, developing reporters to monitor phage infection, etc.) is beyond the scope of this initial report and characterization of Clew-1.

      (7) There is a mistake in at least one reference. An unknown author is listed in reference 48. DA Garsin is not part of the paper. Might be worth looking into further mistakes in the reference list as I suspect this might be an issue related to the citation software.

      Thank you. Yes, odd how that extra author got snuck in. This has been corrected.

      (8) I don't seem to be able to locate a Genbank file or accession number. If it wasn't performed how was evolutionary relatedness data generated?

      The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]

      (9) No genomic information about the isolated phages. Are they temperate or virulent? This would be important information as only strictly lytic phages are currently deemed appropriate for phage therapy. 

      These phage are virulent. We have only been able to isolate resistant bacteria from plaques, but they do not harbor the phage (as detected by PCR). This matches what other researchers have found for Bruynogheviruses.

      Reviewer #2 (Recommendations for the authors): 

      Others have used different PA mutants lacking known phage receptors to pan for new phages. However, it is not totally clear how the screen here was selected for the Psl-specific phage. The authors used flagella and pili mutants and found Clew-1, -3, -6, and -10. These were all Bruynogheviruses. They also isolated a phage that uses the O antigen as a receptor. The family of this latter phage and how it is known to use this as a receptor is not described. 

      Phage Ocp-2 is a Pbunavirus. We added new supplementary figure S3, addressing the O-antigen receptor.

      The authors focused on Clew-1, but the receptor for these other Clew phages is not presented. For Clew-1 the phage could plaque on the fliF deletion mutant but not the wild-type strain. The reason for this never appears to be addressed. The authors leap to consider the involvement of c-di-GMP, but how this relates to fliF appears to be lacking. 

      We have included a supplementary figure demonstrating that all the Clew phage require Psl for infection (Fig. S1A). As noted above, we have uploaded the genomic data that underpins the comparison in our supplementary figure. The phage are all closely related. It therefore made sense to focus on one of the phage for the analysis.  

      It is particularly unclear why this phage doesn't plaque on PAO1 as this strain does make Psl. Related to this, it actually looks like something is happening to PAO1 in Figure S4 (although what units are on the x-axis is not entirely clear).

      We hypothesize that the fraction of susceptible cells in the population dictates whether the phage can make overt plaques. The supplementary figure S4 indicates that a subpopulation of the wild-type culture is susceptible and this is borne out by the fraction of wild type cells that the phage can bind to (~50%). The fliF mutation increases this frequency of susceptible cells to 80-90% (Fig. 3).

      The Tnseq screen to identify receptors is clever and identifies additional phosphodiesterase genes, the deletion of which makes PAO1 susceptible. And the screen to find resistant fliF mutants identified genes involved in Psl. However, the link between the phosphodiesterase mutants and the amount of Psl produced never appears to be established. And the statement that Psl is required for infection (line 130) is never actually tested.

      The link between c-di-GMP and Psl production is well-established in the literature. I think the requirement for Psl in infection is demonstrated multiple ways, including lack of plaque formation on psl mutant strains and lack of phage binding to strains that do not produce Psl, direct binding of the phage to affinity purified Psl.

      Figure 2C describes using a ∆fliF2 strain but how this is different (or if it is different) from ∆fliF described in the text is never explained.

      The difference in the deletions is explained in table S1, in the description for the deletion constructs used in their construction, pEXG2-∆fliF and pEXG2-∆fliF2 (∆fliF2 is smaller than ∆fliF and can be complemented completely with our complementing plasmid, pP37-fliF, which is the reason why we used the ∆fliF2 mutation going forward, rather than the ∆fliF mutation on which the phage was originally isolated).

      Similarly, there is a sentence (line 138) that "Attachment of Clew-1 is Psl-dependent" but this would appear to have no context.

      The relevant figure, Fig. 3, is cited in the next sentence and is the subject of the remaining paragraphs in this section of the manuscript.

      For Figure 3B, why wasn't the single ∆pslC mutant visualized in this analysis? Similar questions relate to the data in Figure 4.

      Analyzing the effect of the pslC deletion in the context of the ∆fliF2 mutant background, which is more permissive for phage infection, is the more stringent test.  

      The efficacy of Clew-1 in the mouse keratitis model is intriguing but it is unclear why the CFU/eye are so variable. The description of how the experiment was actually carried out is not clear. Was only one eye scratched or both? Were controls included with a scratch and no bacteria ({plus minus} phage)?

      One eye was infected. We did not conduct a no-bacteria control (just scratching the cornea is not sufficient to cause disease). The revised manuscript has an updated animal experiment in which we carried the infection forward to 72h with two phage treatments. Following this regiment, there is a significant decrease in CFU, as well as corneal opacity (disease). Variability of the data is a fairly common feature in animal experiments. There are a number of factors, such as does the mouse blink and remove some of the inoculum shortly after deposition of the bacteria or the phage after each treatment that could explain this variability.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Within the analysis we undertook we did look at paralogous blocks in pangraph, based on copy number per genome. However, this could have been clearer in the text and we will rectify this. We also focussed on duplicated/deleted blocks that were present in two of more sub-lineages. This is noted in figure 4 legend but we will make this clearer in other sections of the manuscript.

      We agree that indeed the way paralogs are handled could still be optimised, and that gene duplicates of some genes could have biological importance. The reviewer is suggesting that a synteny analysis between genomes would be best for finding specific regions that are duplicated/deleted within a genome, and if those sections are duplicated/deleted in the same regions of the genome. Since Pangraph does not give such information readily, a larger amount of analysis would be required to confirm such genome position-specific duplications. While this is indeed important, we deem this to be out of scope for the current publication, but will note this as a limitation in the discussion. However, this does not fundamentally change the main conclusions of our analysis.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph.

      Strengths:

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis.

      Weaknesses:

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable.

      We have now added a robust overview of the dataset to supplementary file 1. This is split into 3 sections: public genomes, which were assembled by others; sequenced genomes, which were created and assembled by us; the BUSCO information for all the genomes together. We did not assemble any public data ourselves but retrieved these from elsewhere. We have modified the text to be more specific on this (Line 114 onwards) and the supplementary file is updated to better outline the data.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided.

      We have now included an analysis where we looked to see if the sequencing platform influenced the resulting accessory genome size and the pseudogene count. The details of this are included in lines 207-219, and the results are outlined in lines 251-258. Essentially, we found no correlation between sequencing platform and genome characteristics, although less stringent cut-offs did suggest that PacBio SMRT-only assembled genomes may have larger accessory genomes. We do not believe this is enough to influence our larger inferences from this data. It should be noted that complete genomes, in general, give a better indication of pangenome size compared to draft genomes, as has been shown previously (e.g. Marin et al., 2024). Even with some small potential bias, this makes our analysis more robust than any previously published.

      In relation to the sequencing depth of our own data, all genomes had coverage above 30x, which Sanderson et al. (2024) has shown to be sufficient for highly accurate sequence recovery. We fixed an issue with the L9 isolate from the previous submission, which resulted in a better BUSCO score and overall quality of that isolate and the overall dataset.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done. It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies.

      We apologise for the ambiguity in the methodology. All the isolates were inputted to Pangraph to create the pangenome using this method. This is now made clearer in lines 175-177. Standard pangenome statistics (size, genome fluidity, etc.) derived from this Pangraph output are now present in the results section as well (lines 301-320).

      We then only looked at regions of difference at the sub-lineage level, meaning we grouped genomes by sub-lineage within the resulting graph and looked for blocks common between isolates of the same sub-lineage but absent from one or more other sub-lineages. We did this from both the Panaroo output and the Pangraph output and then retained only blocks found by both. The results of this are now outlined in lines 351-383.

      We focussed on these sub-lineage-specific regions to focus on long-term evolution patterns and not be influenced by single-genome short-term changes. We do not have enough genomes of closely related isolates to truly look at very recent evolution, although the small accessory genome indicates this is not substantial in terms of gene presence/absence. We also did not want potential mis-annotations in a single genome to heavily influence our findings due to the potential issues pointed out by the reviewer above. We state this more clearly in the introduction (lines 106-108), methods (lines 184-186) and results (345-347), and we indicate the limitations in the Discussion, lines 452-457 and 471-473. We also changed the title to ‘shaped’ instead of ‘driven by’.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step.

      We apologise for the confusion here as indeed the RDs terminology is very MTBC-specific. Current RDs are always relevant to H37Rv, as that is how original discovery of these regions was done and that is how RDScan works. We clarify this in the introduction (lines 67-68). If we found a large sequence polymorphism (e.g. by Pangraph) and searched for known RDs using RDScan, we then assigned a current RD name to this LSP. This uses H37Rv as a reference. If we did not find a known RD, we then classified the LSP as a new RD if it is present in H37Rv, or left the designation as an LSP if not in H37Rv, thus expanding the analysis beyond the H37Rv-centric approaches used by others previously. This is hopefully now made clearer in the methods, lines 187-194.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail.

      We have amended the introduction to explain this terminology better (lines 67-68). Naming of the RDs here came from using RDScan to assign current names to any accessory regions we found and if such a region was not a known RD, we gave it a lineage-related name, allowing for proper RD naming later (lines 187-194). Because the Bespiatyk paper is the basis for RDScan, our work implicitly compares to this throughout, as any RDs we find which were not picked up by RDScan are thus novel compared to that paper.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC.

      We have now amended the analysis to specifically designate a structural variant as a deletion if present in the majority of strains and absent in a minority, or an insertion/duplication if present in a minority and absent in a majority (lines 191-192). We also ran Panaroo without merging paralogs to examine duplication in this output; Pangraph implicitly includes paralogs already.

      From all these analyses we did not find any structural variants classed as insertions/duplications and did not find paralogs to be a major feature at the sub-lineage level (lines 377-383). While these features could be important on shorter timescales, we do not have enough closed genomes to confidently state this (limitation outlined in lines 452-457). Therefore, our assertion that deletions are a primary force shaping the long-term evolution in this group still holds.

      Reviewer #2 (Public Review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed.

      Weaknesses:

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this.

      We have included a Heaps law and genome fluidity calculation for each pangenome estimation to demonstrate that the pangenome is closed. This is detailed in lines 225-228 with results shown in lines 274-278 and 316- 320 and Supplementary Figure 2. We agree that more closely related genomes would benefit a future version of this analysis and indicate we indicate the limitations in the Discussion, lines 452-457 and 471-473.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      l. 24, "with distinct genomic features". I'm not sure what you are referring to here.

      We refer to the differences in accessory genome and related functional profiles but did not want to bloat the abstract with such additional details

      Introduction

      l. 40, "L1 to L9". A lineage 10 has been described recently: https://doi.org/10.3201/eid3003.231466.

      We have updated the text and the reference. Unfortunately, no closed genome for this lineage exists so we have not included it in the analyses. We note this in the results, like 232

      l.62/3, "caused by the absence of horizontal gene transfer, plasmids, and recombination". Recombination is not absent in the MTBC, only horizontal gene transfer seems to be, which is what the cited studies show. Indeed a few sentences later homologous recombination is mentioned as a cause of deletions.

      This has now been removed from the introduction

      l. 67, "within lineage diversity is thought to be mostly driven by SNPs". Again I'm not sure what is meant here with "driven by". Point mutations are probably the most common mutational events, but duplications, insertions, deletions, and gene conversion also occur and can affect large regions and possibly important genes, as shown in a recent preprint (https://doi.org/10.1101/2024.03.08.584093).

      We have changed the text to say ‘mostly composed of’. While indeed other SNVs may be contributing, the prevailing thought at lineage level is that SNPs are the primary source of diversity. The linked pre-print is looking at within transmission clusters and this has not been described at the lineage level, which could be done in a future work.

      l. 100/1. "that can account for variations in virulence, metabolism, and antibiotic resistance". I would phrase this conservatively since the functional inferences in this study are speculative.

      This has now been tempered to be less specific.

      Methods

      l. 108. That an assembly has a single contig does not mean that it is "closed". Many single contig assemblies on NCBI are reference-guided short-read assemblies, that is, fragments patched together rather than closed assemblies. The same could be true for long-read assemblies.

      We specifically chose those listed as closed on NCBI so rely on their checks to ensure this is true. We have stated this better in the paper, line 117.

      l. 111. From Supplementary Table 1 understand that for many genomes only the reads were available (no ASM number). Did you assemble these genomes? If yes, how? The assembly method is not indicated in the supplement, contrary to what is written here.

      All public genomes were downloaded in their assembled forms from the various sources. This is specified better in the text (line 118) and the supplementary table 1 now lists the accessions for all the assemblies.

      l. 113. How many assemblies passed this threshold? And is BUSCO actually useful to assess assembly quality in the MTBC? I assume the dynamic, repetitive gene families that cause problems for assembly and mapping in TB (PE, PPE, ESX) do not figure in the BUSCO list of single-copy orthologs.

      All assemblies passed the BUSCO thresholds for high-quality genomes as laid out in Supplementary Table 1. While indeed this does not include multi-copy genes such as PE/PPE we focussed on regions of difference at the sub-lineage level where two or more genomes represent that sub-lineage. This means any assembly issues in a single genome would need to be exactly the same in another of the same sub-lineage to be included in our results. Through this, we aimed to buffer out issues in individual assemblies.

      l. 147: Why is Panaroo used with -merge-paralogs? I understand that near-identical genes may not be too interesting from a functional perspective, but if the aim of the analysis is to make broad claims about processes driving genome evolution, paralogs should be considered.

      We chose to do so with merged paralogs to look for larger patterns of diversity beyond within-genome paralogs. Additionally, this was required to build the core phylogenetic tree. However, as the reviewer points out, this may bias our findings towards deletions and away from duplications as a primary evolutionary force.

      We repeated this without the merged paralogs option and indeed found a larger pangenome, as outlined in Table 1. However, at the sub-lineage level, this did not result in any new presence/absence patterns (lines 381-383). This means the paralogs tended to be in single genomes only. This still indicates that deletions are the primary force in the longer-term evolution of the complex but indeed on shorter spans this may be different.

      l. 153: remove the comment in brackets.

      This has been fixed and the proper URL placed in instead.

      l. 159: which genomes, and why those?

      This is now clarified to state all genomes were used for this analysis.

      l. 161, "gene blocks": since this analysis is introduced as capturing the non-coding part of the genome, maybe just call them "blocks"?

      All references to gene blocks are now changed to genomic blocks to be more specific.

      l. 162: what happens with blocks that yield no hits against RvD1, TbD1, and H37Rv?

      We named these with lineage-specific names (supplementary table 4) but did not assign RD names specifically.

      l. 164: where does the information about the regions of difference come from? How exactly were these regions determined?

      Awe have expanded this section to be more specific on the use of RDScan and new naming, along with how we determine if something is an RD/LSP.

      Results

      l. 185ff: This paragraph gives many details about the geographic origin of the samples, but what I'd expect here is a short description of assembly qualities, for example, the results of the BUSCO analysis, a description of your own Nanopore assemblies, or a small analysis of the number of indels/pseudogenes relative to sequencing technology or coverage (see comment in the public review).

      This section (lines 231-258) has been expanded considerably to give a better overview of the dataset and any potential biases. Supplementary table 1 has also been expanded to include more information on each strain.

      l. 187, "324 genomes published previously": 322 according to the methods section.

      The number has been fixed throughout to the proper total of public genomes (329).

      l. 201: define the soft core, shell, and cloud genes.

      This is now defined on line 262

      l. 228, "defined primarily by RD105 and RD207 deletions": this claim seems to come from the analysis of variable importance (Factoextra), which should be made clear here.

      This has been clarified on line 333.

      l. 237, "L8, serving as the ancestor of the MTBC": this is incorrect, equivalent to saying that the Chimpanzee is the ancestor of Homo sapiens.

      We have changed this to basal to align with how it is described in the original paper.

      l. 239, "The accessory genome of the MTBC". It is a bit confusing that the same term, 'accessory genome', is used here for the graph-based analysis, which is presented as a way to look at the non-coding part of the genome.

      We have clarified the terminology on line 347 and improved consistency throughout.

      l. 240/1, "specific to certain lineages and sublineages". What exactly do you mean by "specific" to? Present only in members of a certain lineage/sublineage? In all members of a certain lineage/sublineage? Maybe an additional panel in Figure 5, showing examples of lineage- and sublineage-specific variants, would help the reader grasp this key concept.

      We have clarified this on line 349 and the legend of what is now figure 4.

      l. 241/2, "82 lineage and sublineage-specific genomic regions ranging from 270 bp to 9.8 kb". Were "gene blocks" filtered for a minimum size, or why are there no variants smaller than 270 bp? A short description of all the blocks identified in the graph could be informative (their sizes, frequencies ...).

      Yes, a minimum of 250bp was set for the blocks to only look at larger polymorphisms. This is clarified on line 177 and 304.

      A second point: It is not entirely clear to me what Figure 6 is showing. Are you showing here a single representative strain per sublineage? Or have you somehow summarized the regions of difference shown in Figure 5 at the sublineage level? What is the tree on the left? This should be made clear in the legend and maybe also in the methods/results.

      In figure 4 (which was figure 6), because each RD is common to all members of the same sub-lineage, we have placed a single branch for each sub-lineage. This is has been clarified in the legend.

      l. 254, "this gene was classified as being in the core genome": why should a partially deleted gene not be in the core genome?

      You are correct, we have removed that statement.

      l. 258/259, "The Pangraph alignment approach identified partial gene deletion and non-coding regions of the DNA that were impacted by genomic deletion". I do not understand how you classify a structural variant identified in the pangenome graph as a deletion or an insertion.

      This has been clarified as relative to H37Rv, as this is standard practice for RDs and general evolutionary analyses in MTBC, as outlined above.

      l. 262/263 , "the accessory genome of the MTBC is small and is acquired vertically from a common ancestor within the lineage". If deletion is the main process involved here, "acquired" seems a bit strange.

      We agree and changed the header to better reflect the discussion on mis-annotation issues

      Figure 1: Good to know, but not directly relevant for the rest of the paper. Maybe move it to the supplement?

      This has been moved to Supplementary figure 1

      Figure 2: the y-axis is labeled 'Variable genome size', but from the text and the legend I figure it should be 'Number of accessory genes'?

      This has been changed to ‘accessory genes’ in Figure 1 (which was figure 2 in previous version).

      Figure 4: too small.

      We will endeavour to ensure this is as large as possible in the final version.

      Discussion

      l. 271, "MTBC accessory genome is ... acquired vertically". See above.

      Changed, as outlined above.

      l. 292, "appeared to be fragmented genes caused by misassemblies". Is there a way to distinguish "true" pseudogenes from misassemblies? This could be a relevant issue for low-coverage long-read assemblies (see public review).

      Not that we are currently aware of, but we do know other groups which are working on this issue.

      l. 300/1, "the whole-genome approach could capture higher genetic variations". Do you mean the graph approach? I'm not sure that comparing the two approaches here makes sense, as they serve different purposes. A pangenome graph is a summary of all genetic variation, while the purpose of Panaroo is to study gene absence/presence. So by definition, the graph should capture more genetic variation.

      This statement was specifically to state that much genetic variation in MTBC is outside the coding genes and so traditional “pangenome’ analyses are actually not looking at the full genomic variation.

      l. 302/3, "this method identified non-coding regions of the genome that were affected by genomic deletions". See the comments above regarding deletions versus insertions. I'd say this method identifies coding and non-coding regions that were affected by genomic deletions and insertions.

      We have undertaken additional analyses to be sure these are likely deletions, as outlined above.

      l. 305: what are "lineage-independent deletions"?

      We labelled these as convergent evolution, now clarified on line 443.

      l. 329: How is RD105 "caused" by the insertion of IS6110? I did not find RD105 mentioned in the Alonso et al. paper. Similarly below, l. 331, how is RD207 "linked" to IS6110?

      The RD105 connection was misattributed as IS6110 insertion is related to RD152, not RD105. This has now been removed.

      RD207 is linked to IS6110 as its deletion is due to recombination between two such elements. This is now clarified on line 486.

      l. 345, "the growth advantage gene group": not quite sure what this is.

      We have fixed this on line 499 to state they are genes which confer growth advantages.

      l. 373ff: The role of genetic drift in the evolution of the MTBC is an open question, other studies have come to different conclusions than Hershberg et al. (this has been recently reviewed: https://doi.org/10.24072/pcjournal.322).

      We have outlined this debate better in lines 527-531

      l. 375/6, "Gene loss, driven by genetic drift, is likely to be a key contributor to the observed genetic diversity within the MTBC." This sentence would need some elaboration to be intelligible. How does genetic drift drive gene loss?

      We have removed this.

      l. 395/6, "... predominantly driven by genome reduction. This observation underlines the importance of genomic deletions in the evolution of the MTBC." See comments above regarding deletions. I'm not convinced that your study really shows this, as it completely ignores paralogs and the processes counteracting reductive genome evolution: duplication and gene amplification.

      As outlined above, we have undertaken additional analyses to more strongly support this statement.

      l. 399, "the accessory genome of MTBC is a product of gene deletions, which can be classified into lineage-specific and independent deletions". Again, I'm not sure what is meant by lineage-independent deletions.

      We have better defined this in the text, line 443, to be related to convergent evolution.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      In lines 120-121, it is mentioned that TB-profiler v4.4.2 was used for lineage classification, but this version was released in February 2023. As I understand there have been some changes (inclusion/exclusion) of certain lineage markers. Would it not be appropriate to repeat lineage classification with a more recent version? This would of course require extensive re-analysis, so could the lineage marker database perhaps also be cited.

      We have rerun all the genomes through TB-Profiler v6.5 and updated the text to state this; the exact database used is also now stated.

      Could the authors perhaps include the sequencing summary or quality of the nanopore sequences? The L9 (Mtb8) sample had a relatively lower depth and resulted in two contigs. Yet one contig was the initial inclusion criteria. It is unclear whether these samples were excluded from some of the analyses. Mtb6 also has relatively low coverage. Was the sequencing quality adequate to accurately identify all the lineage markers, in particular those with a lower depth of coverage? Could a hybrid approach be an inexpensive way to polish these assemblies?

      We reanalysed the L9 sample and, with some better cleaning, got it to a single contig with better depth and overall score. This is outlined in the Supplementary table 1 sheets. While depth is average, it is still above the recommended 30x, which is needed for good sequence recovery (Sanderson et al., 2024). We did indeed recover all lineage markers from these assemblies.

      Recommendations for improving the writing and presentation.

      The introduction is well-written and recent MTBC pangenomic studies have been incorporated, but I am curious as to why this paper was not referred to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922483/ I believe this was the first attempt to study the pangenome, albeit with a different research question. Nearly all previous analyses largely focused on utilizing the pangenome to investigate transmission.

      Indeed this study did look at a pangenome of sorts, but specifically SNPs and not genes or regions. Since the latter is the main basis for pangenome work these days, we chose not to include this paper.

      Minor corrections to the text and figures.

      In line 129, it is explained that DNA was extracted to be suitable for PacBio sequencing, but ONT sequencing was used for the 11 new sequences. Is this a minor oversight or do the authors feel that DNA extracted for PacBio would be suitable for ONT sequencing? It is a fair assumption.

      We apologise, this is a long-read extraction approach and not specific to PacBio. We have amended the text to state this.

      In line 153, this should be removed: (Conor, could you please add the script to your GitHub page?).

      This has been fixed now.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminished by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the editors and reviewers for their constructive feedback and careful consideration of our manuscript. Despite their acknowledgment of the potential of our study to yield valuable insights into the role of CF activity in cerebellar learning and its phase-specific involvement, we have meticulously addressed all the methodological concerns raised by providing additional clarifications and explanations in this letter.

      In response to concerns regarding the efficacy of long-term optogenetic inhibition, we conducted additional in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase (Figure 2, lines 112-139). Although stable single-unit recording beyond 40 minutes was not feasible due to technical challenges, the robust suppression of CF-evoked complex spikes we observed during this period (Figure 2, lines 112–139) provides strong evidence that halorhodopsin-mediated inhibition persists over the longer irradiation intervals employed in our behavioral assays.

      Moreover, given that there is a concern regarding the CaMKII promoter also inducing expression in neighboring mossy fibers, potentially affecting simple spike activity, we have presented data in Figure 2C, which illustrates that PC simple spike firing rates remain unchanged during prolonged illumination. This finding confirms that our optogenetic manipulation selectively disrupts CF-mediated complex spikes without influencing mossy fiber to PC transmission. We have elucidated these results further in lines 128 to 136.

      Lastly, we have broadened our Discussion to consider alternative mechanisms of CF involvement in cerebellar learning, including the modulation of molecular layer interneurons (Rowan et al., 2018) and direct CF interactions with vestibular nuclear neurons (Balaban et al., 1981), thereby offering a more comprehensive perspective on the multifaceted role of CF signaling. Specific clarifications regarding these points are articulated from lines 222 to 242 and 243 to 254 in the manuscript. We are confident that these revisions adequately address the reviewers' concerns and further substantiate the specificity and significance of our study findings

      (1) Rowan, Matthew JM, et al. "Graded control of climbing-fiber-mediated plasticity and learning by inhibition in the cerebellum." Neuron 99.5 (2018): 999-1015.

      (2) Balaban, Carey D., Yasuo Kawaguchi, and Eiju Watanabe. "Evidence of a collateralized climbing fiber projection from the inferior olive to the flocculus and vestibular nuclei in rabbits." Neuroscience letters 22.1 (1981): 23-29.

    1. Author response:

      The following is the authors’ response to the original reviews

      Life Assessment

      The authors use a synthetic approach to introduce synaptic ribbon proteins into HEK cells and analyze the ability of the resulting assemblies to cluster calcium channels at the active zone. The use of this ground-up approach is valuable as it establishes a system to study molecular interactions at the active zone. The work relies on a solid combination of super-resolution microscopy and electrophysiology, but would benefit from: (i) additional ultrastructural analysis to establish ribbon formation (in the absence of which the claim of these being synthetic ribbons might not be supported; (ii) data quantification (to confirm colocalization of different proteins); (iii) stronger validation of impact on Ca2+ function; (iv) in depth discussion of problems derived from the use of an over-expression approach.

      We thank the editors and the reviewers for the constructive comments and appreciation of our work. Please find a detailed point-to-point response below. In response to the critique received, we have now (i) included an ultrastructural analysis of the SyRibbons using correlative light microscopy and cryo-electron tomography, (ii) performed quantifications to confirm the colocalisation of the various proteins, (iii) discussed and carefully rephrased our interpretation of the role of the ribbon in modulating Ca<sup>2+</sup> channel function and (iv) discussed concerns regarding the use of an overexpression system. 

      Public Reviews:

      Reviewer #1 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript. We have completely overhauled the manuscript taking the suggestions of the reviewer into account.

      (1) Are these truly "synthetic ribbons". The ribbon synapse is traditionally defined by its morphology at the EM level. To what extent these structures recapitulate ribbons is not shown. It has been previously shown that Ribeye forms aggregates on its own. Do these structures look any more ribbonlike than ribeye aggregates in the absence of its binding partners?

      We thank reviewer 1 for their constructive feedback and critique of the work. 

      We agree that traditionally, ribbon synapses have always been defined by the distinct morphology observed at the EM level. However, since the discovery of the core-components of ribbons (RIBEYE and Piccolino) confocal and super-resolution imaging of immunofluorescently labelled ribbons have gained importance for analysing ribbon synapses. A correspondence of RIBEYE immunofluorescent structures at the active zone to electron microscopy observations of ribbons has been established in numerous studies (Wong et al, 2014; Michanski et al, 2019, 2023; Maxeiner et al, 2016; Jean et al, 2018) even though direct correlative approaches have yet to be performed to our knowledge. We have now analysed SyRibbons using cryo-correlative electron-light microscopy. We observe that GFPpositive RIBEYE spots corresponded well with electron-dense structures, as is characteristic for synaptic ribbons (Robertis & Franchi, 1956; Smith & Sjöstrand, 1961; Matthews & Fuchs, 2010). We could also observe SyRibbons within 100 nm of the plasma membrane (see Fig. 3). We have now added this qualitative ultrastructural analysis of SyRibbons in the main manuscript (lines 272 - 294, Fig. 3 and Supplementary Fig. 3).

      (2) No new biology is discovered here. The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two. Even the membrane localization of the complexes required the introduction of a membraneanchoring motif.

      We respectfully disagree with the overall assessment. Our study emphasizes the synthetic establishment of protein assemblies that mimic key aspects of ribbon-type active zone, defining minimum molecular requirements. Numerous previous studies have described the role of the synaptic ribbon in organising the spatial arrangement of Ca<sup>2+</sup> channels, regulating their abundance and possibly also modulating their physiological properties (Maxeiner et al, 2016; Frank et al, 2010; Jean et al, 2018; Wong et al, 2014; Grabner & Moser, 2021; Lv et al, 2016). We would like to highlight that there remain major gaps between existing in vitro and in vivo data; for instance, no evidence for direct or indirect interactions between Ca<sup>2+</sup> channels and RIBEYE have been demonstrated so far. While we do indeed take advantage of previously known interactions between RIBEYE and Bassoon (tom Dieck et al, 2005); between Bassoon, RBP2 and P/Q-type Ca<sup>2+</sup> channels (Davydova et al, 2014); and between RBP2 and Ltype Ca<sup>2+</sup> channels (Hibino et al, 2002), our study tries to bridge these gaps by establishing the indirect link between the synaptic ribbon (RIBEYE) and L-type CaV1.3 Ca<sup>2+</sup> channels using a bottom-up approach, which has previously just been speculative. Our data shows how even in a synapse-naive heterologous expression system, ribbon synapse components assemble Ca<sup>2+</sup> channel clusters and even show a partial localisation of Ca<sup>2+</sup> signal. Moreover, we argue that the established reconstitution approach provides other interesting insights such as laying ground-up evidence supporting the anchoring of the synaptic ribbon by Bassoon. Finally, we expect that the established system will serve future studies aimed at deciphering the role of putative CaV1.3 or CaV1.4 interacting proteins in regulating Ca<sup>2+</sup> channels of ribbon synapses by providing a more realistic Ca<sup>2+</sup> channel assembly that has been available in heterologous expression systems used so far. In response to the reviewers comment we have augmented the discussion accordingly.  

      (3) The only thing ribbon-specific about these "syn-ribbons" is the expression of ribeye and ribeye does not seem to participate in the localization of other proteins in these complexes. Bsn, Cav1.3 and RBP2 can be found in other neurons.

      The synaptic ribbon made of RIBEYE is the key molecular difference in the molecular AZ ultrastructure of ribbon synapses in the eye and the ear. We hypothesize the ribbon to act as a superscaffold that enables AZ with large Ca<sup>2+</sup> channel assemblies and readily releasable pools. In further support of this hypothesis, the present study on synthetic ribbons shows that CaV1.3 Ca<sup>2+</sup> channel clusters are larger in the presence of SyRibbons compared to SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tetratransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, and RIBEYE, Fig. 6). In response to the reviewers comment we now added an analysis of triple-transfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon), in which CaV1.3 Ca<sup>2+</sup> channel clusters again are significantly smaller than at the SyRibbons and indistinguishable from SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters (Fig. 6E, F).

      (4) As the authors point out, RBP2 is not necessary for some Ca channel clustering in hair cells, yet seems to be essential for clustering to bassoon here.

      Here we would like to clarify that RBP2 is indeed important in inner hair cells for promoting a larger complement of CaV1.3 and RBP2 KO mice show smaller CaV1.3 channel clusters and reduced whole cell and single-AZ Ca<sup>2+</sup> influx amplitudes (Krinner et al, 2017). However, a key point of difference we emphasize on is that even though CaV1.3 clusters appeared smaller, they did not appear broken or fragmented as they do upon genetic perturbation of Bassoon (Frank et al, 2010), RIBEYE (Jean et al, 2018) or Piccolino (Michanski et al, 2023). This highlights how there may be a hierarchy in the spatial assembly of CaV1.3 channels at the inner hair cell ribbon synapse (also described in the discussion section “insights into presynaptic Ca<sup>2+</sup> channel clustering and function”) with proteins like RBP2 regulating abundance of CaV1.3 channels at the synapse and organising them into smaller clusters – what we have termed as “nanoclustering”; while Bassoon and RIBEYE may serve as super-scaffolds further organizing these CaV1.3 nanoclusters into “microclusters”. Observations of fragmented Ca<sup>2+</sup> channel clusters and broader spread of Ca<sup>2+</sup> signal seen upon Ca<sup>2+</sup> imaging in RIBEYE and Bassoon mutants (Jean et al, 2018; Frank et al, 2010; Neef et al, 2018), and the absence of such a phenotype in RBP2 mutants (Krinner et al, 2017) may be explained by such a differential role of these proteins in organising Ca<sup>2+</sup> channel spatial assembly. The data of the present study on reconstituted ribbon containing AZs are in line with these observations in inner hair cells: RBP2 appears important to tether Ca<sup>2+</sup> channels to Bassoon and these AZ-like assemblies are organised to their full extent by the presence of RIBEYE. As mentioned in the response to point 3 of the reviewer, we have now further strengthened this point by adding the analysis of SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tripletransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, Fig. 6E, F). Moreover, we have revised the discussion accordingly. 

      (5) The difference in Ca imaging between SyRibbons and other locations is extremely subtle.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of  SyRibbons and provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerably high expression throughout the membrane even in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B, where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons (for an opposing scenario, please see the cell in Fig. 6B upper panel with very localised CaV1.3 distribution underneath SyRibbons). This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a remarkably big difference in Ca<sup>2+</sup> influx due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). However, it was the spatial spread of the Ca2+ signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca2+ hotspots seen in the wild-type controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters (see also our response to points 3 and 4 of the reviewer): this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      (6) The effect of the expression of palm-Bsn, RBP2 and the combination of the two on Ca-current is ambiguous. It appears that while the combination is larger than the control, it probably isn't significantly different from either of the other two alone (Fig 5). Moreover, expression of Ribeye + the other two showed no effect on Ca current (Figure 7). Also, why is the IV curve right shifted in Figure 7 vs Figure 5?

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. As the reviewer also correctly pointed out, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. Moreover, we would like to thank reviewer 1 for pointing out the right shift in the IV curve which was due to an error in the values plotted on the x-axis. This has been corrected in the updated version of the manuscript. 

      (7) While some of the IHC is quantified, some of it is simply shown as single images. EV2, EV3 and Figure 4a in particular (4b looks convincing enough on its own, but could also benefit from a larger sample size and quantification)

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      Reviewer #2 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).

      We acknowledge this limitation highlighted by the reviewer arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. 

      (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Why didn't the authors try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell?

      This is a valid point for discussion that we also had here extensively. We indeed did consider pheochromocytoma cells (PC12 cells) for reconstitution of ribbon-type AZs and also performed initial experiments with these in the initial stages of the project. PC12 cells offer the advantage of providing synaptic-like microvesicles and also endogenously express several components of the presynaptic machinery such as Bassoon, RIM2, ELKS etc (Inoue et al, 2006) such that overexpression of exogenous AZ proteins would have to be limited to RIBEYE only. 

      However, a major drawback of PC12 cells as a model is the complex molecular background of these cells. We have also briefly described this in the discussion section (line 615 – 619). Naïve, undifferentiated PC12 cells show highly heterogeneous expression of various CaV channel types (Janigro et al, 1989); however, CaV1.3, the predominant type in ribbon synapses of the ear, does not seem to express in these cells (Liu et al, 1996). Furthermore, our attempts at performing immunostainings against CaV1.3 and at overexpressing CaV1.3 in PC12 cells did not prove successful and we decided on refraining from pursuing this further (data not shown). 

      On the contrary, HEK293 cells being “synapse-naïve” provide the advantage of serving as a “blank canvas” for performing such reconstitutions, e.g. they lack voltage-gated Ca<sup>2+</sup> channels and multidomain proteins of the active zone. Moreover, an important practical aspect for our choice was the availability of the HEK293 cell line with stable (and inducible) expression of the CaV1.3 Ca<sup>2+</sup> channel complex. Finally, as described in lines 613 – 614 of the discussion section, even though HEK293 cells lack SVs and the molecular machinery required for their release, our work paves way for future studies which could employ delivery of SV machinery via co-expression (Park et al, 2021) which could then be analyzed by the correlative light and electron microscopy workflow we worked out and added during revision. 

      (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of SyRibbons. Yes, we employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation also in regions without SyRibbons which likely reduced the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      Reviewer #3 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) The results obtained in a heterologous system (HEK293 cells) need to be interpreted with caution. They will importantly speed the generation of models and hypothesis that will, however, require in vivo validation.

      We acknowledge this limitation highlighted by Reviewer 3 arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. We employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation, even in regions without SyRibbons and this could reduce the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      (2) The authors analyzed the distribution of RIBEYE clusters in different membrane compartments and correctly conclude that RIBEYE clusters are not trapped in any of those compartments, but it is soluble instead. The authors, however, did not carry out a similar analysis for Palm-Bassoon. It is therefore unknown if Palm-Bassoon binds to other membrane compartments besides the plasma membrane. That could occur because in non-neuronal cells GAP43 has been described to be in internal membrane compartments. This should be investigated to document the existence of ectopic internal Synribbons beyond the plasma membrane because it might have implications for interpreting functional data in case Ca2+-channels become part of those internal Synribbons.

      In response to this valid concern, we have now included the suggested experiment in Supplementary Figure 1. We investigated the subcellular localisation of Palm-Bassoon and did not find Palm-Bassoon puncta to colocalise with ER, Golgi, or lysosomal markers, suggesting against a possible binding with membrane compartments inside the cell. We have added the following sentence in the results section, line 145 : “Palm-Bassoon does not appear to localize in the ER, Golgi apparatus or lysosomes (Supplementary Fig 1 D, E and F).”

      (3) The co-expression of RBP2 and Palm-Bassoon induces a rather minor but significant increase in Ca2+-currents (Figure 5). Such an increase does not occur upon expression of (1) Palm-Bassoon alone, (2) RBP2 alone or (3) RIBEYE alone (Figure 5). Intriguingly, the concomitant expression of PalmBassoon, RBP2 and RIBEYE does not translate into an increase of Ca2+-currents either (Figure 7).

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. We also highlight that, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. 

      (4) The authors claim that Ca2+-imaging reveals increased CA2+-signal intensity at synthetic ribbontype AZs. That claim is a subject of concern because the increase is rather small and it does not correlate with an increase in Ca2+-currents.

      Thanks for the comment: please see our response to your first comment and the lines 585 – 610 in the discussion section.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors should have a better discussion of problems derived from over-expression.

      Done. Please see above. 

      (2) Ideally, the authors would repeat the study using a secretory cell line, but this is of course not possible. The idea could be brought forth, though.

      As described above in our response to the public review of reviewer 2, we have discussed this idea in the discussion section (refer to lines 615 – 619), emphasizing on both the advantages and the limitations of using a secretory cell line (e.g. PC12 cells) instead of HEK293 cells as a model for performing such reconstitutions. 

      Reviewer #3 (Recommendations For The Authors):

      (1) There are several figures in which colocalization between different proteins is studied only displaying images but without any quantitative data. This should be corrected by providing such a quantitative analysis.

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      (2) The little increase in Ca2+-currents and Ca2+-influx associated to the clustering of Ca2+-channels to Synribbons is a concern. The authors should discuss if such a minor increase (found only when Palm-Bassoon and RBP2 ae co-expressed) would have or not physiological consequences in an actual synapse. They might discuss the comparison of those results and compare with results obtained in genetically modified mice in which Ca2+-currents are affected upon the removal of AZs proteins. On the other hand, they should explain why Ca2+-currents do not increase when the Synribbons are formed by RIBEYE, Palm-Bassoon and RBP2.

      Done. Please see above. 

      (3) The description of the patch-clamp experiments should be enriched by including representative currents. Did the authors measure tail currents?

      We would like to thank the reviewer for the valuable suggestion and have now added representative currents to the figures (see Supplementary Figure 5B). We agree with the reviewer on the importance of further characterizing the Ca<sup>2+</sup> currents in the presence and absence of SyRibbons by analysis of tail currents for counting the number of Ca<sup>2+</sup> channels by non-stationary fluctuation analysis but consider this to be out of scope of the current study and an objective for future studies. 

      (4) The current displayed in Figure 7 E should be explained better.

      Previous studies have shown that Ca<sup>2+</sup>-binding proteins (CaBPs) compete with Calmodulin to reduce Ca<sup>2+</sup>-dependent inactivation (CDI) and promote sustained Ca<sup>2+</sup> influx in Inner Hair Cells (Cui et al, 2007; Picher et al, 2017). In the absence of CaBPs, CaV1.3-mediated Ca<sup>2+</sup> currents show more rapid CDI as in the case here upon heterologous expression in HEK cells ((Koschak et al, 2001), see also Picher et al 2017 where co-expression of CaBP2 with CaV1.3 inhibits CDI in HEK293 cells). The inactivation kinetics of CaV1.3 are also regulated by the subunit composition (Cui et al, 2007) along with the modulation via interaction partners and given the reconstitution here we do not find the currents very surprising. 

      (5) Is the difference in Ca2+-influx still significantly higher upon the removal of the maximum value measured in positive Syribbons spots (Figure 7, panel K)?

      Yes, on removing the maximum value, the P value increases from 0.01 to 0.03 but remains statistically significant. 

      (6) In summary, although the approach pioneered by the authors is exciting and provides relevant results, there is a major concern regarding the interpretation of the modulation of Ca2+ channels.

      We have now carefully rephrased our interpretation on the modulation of Ca<sup>2+</sup> channels.  

      References

      Brandt A (2005) Few CaV1.3 Channels Regulate the Exocytosis of a Synaptic Vesicle at the Hair Cell Ribbon Synapse. Journal of Neuroscience 25: 11577–11585

      Cui G, Meyer AC, Calin-Jageman I, Neef J, Haeseleer F, Moser T & Lee A (2007) Ca2+-binding proteins tune Ca2+-feedback to Cav1. 3 channels in mouse auditory hair cells. The Journal of Physiology 585: 791–803

      Davydova D, Marini C, King C, Klueva J, Bischof F, Romorini S, Montenegro-Venegas C, Heine M, Schneider R, Schröder MS, et al (2014) Bassoon specifically controls presynaptic P/Q-type Ca(2+) channels via RIM-binding protein. Neuron 82: 181–194

      tom Dieck S, Altrock WD, Kessels MM, Qualmann B, Regus H, Brauner D, Fejtová A, Bracko O, Gundelfinger ED & Brandstätter JH (2005) Molecular dissection of the photoreceptor ribbon synapse: physical interaction of Bassoon and RIBEYE is essential for the assembly of the ribbon complex. J Cell Biol 168: 825–836

      Frank T, Rutherford MA, Strenzke N, Neef A, Pangršič T, Khimich D, Fejtova A, Gundelfinger ED, Liberman MC, Harke B, et al (2010) Bassoon and the synaptic ribbon organize Ca2+ channels and vesicles to add release sites and promote refilling. Neuron 68: 724–738

      Grabner CP & Moser T (2021) The mammalian rod synaptic ribbon is essential for Cav channel facilitation and ultrafast synaptic vesicle fusion. eLife 10: e63844

      Hibino H, Pironkova R, Onwumere O, Vologodskaia M, Hudspeth AJ & Lesage F (2002) RIM - binding proteins (RBPs) couple Rab3 - interacting molecules (RIMs) to voltage - gated Ca2+ channels. Neuron 34: 411–423

      Inoue E, Deguchi-Tawarada M, Takao-Rikitsu E, Inoue M, Kitajima I, Ohtsuka T & Takai Y (2006) ELKS, a protein structurally related to the active zone protein CAST, is involved in Ca2+-dependent exocytosis from PC12 cells. Genes to Cells 11: 659–672

      Janigro D, Maccaferri G & Meldolesi J (1989) Calcium channels in undifferentiated PC12 rat pheochromocytoma cells. FEBS Letters 255: 398–400

      Jean P, Morena DL de la, Michanski S, Tobón LMJ, Chakrabarti R, Picher MM, Neef J, Jung S, Gültas M, Maxeiner S, et al (2018) The synaptic ribbon is critical for sound encoding at high rates and with temporal precision. Elife 7: e29275

      Koschak A, Reimer D, Huber I, Grabner M, Glossmann H, Engel J & Striessnig J (2001) alpha 1D (Cav1.3) subunits can form l-type Ca2+ channels activating at negative voltages. J Biol Chem 276: 22100–22106

      Krinner S, Butola T, Jung S, Wichmann C & Moser T (2017) RIM-Binding Protein 2 Promotes a Large Number of CaV1.3 Ca2+-Channels and Contributes to Fast Synaptic Vesicle Replenishment at Hair Cell Active Zones. Front Cell Neurosci 11: 334

      Liu H, Felix R, Gurnett CA, De Waard M, Witcher DR & Campbell KP (1996) Expression and Subunit Interaction of Voltage-Dependent Ca2+ Channels in PC12 Cells. J Neurosci 16: 7557–7565

      Lv C, Stewart WJ, Akanyeti O, Frederick C, Zhu J, Santos-Sacchi J, Sheets L, Liao JC & Zenisek D (2016) Synaptic Ribbons Require Ribeye for Electron Density, Proper Synaptic Localization, and Recruitment of Calcium Channels. Cell Reports 15: 2784–2795

      Matthews G & Fuchs P (2010) The diverse roles of ribbon synapses in sensory neurotransmission. Nat Rev Neurosci 11: 812–822

      Maxeiner S, Luo F, Tan A, Schmitz F & Südhof TC (2016) How to make a synaptic ribbon: RIBEYE deletion abolishes ribbons in retinal synapses and disrupts neurotransmitter release. The EMBO Journal 35: 1098–1114

      Michanski S, Kapoor R, Steyer AM, Möbius W, Früholz I, Ackermann F, Gültas M, Garner CC, Hamra FK, Neef J, et al (2023) Piccolino is required for ribbon architecture at cochlear inner hair cell synapses and for hearing. EMBO Rep 24: e56702

      Michanski S, Smaluch K, Steyer AM, Chakrabarti R, Setz C, Oestreicher D, Fischer C, Möbius W, Moser T, Vogl C, et al (2019) Mapping developmental maturation of inner hair cell ribbon synapses in the apical mouse cochlea. PNAS 116: 6415–6424

      Neef J, Urban NT, Ohn T-L, Frank T, Jean P, Hell SW, Willig KI & Moser T (2018) Quantitative optical nanophysiology of Ca2+ signaling at inner hair cell active zones. Nat Commun 9: 290

      Park D, Wu Y, Lee S-E, Kim G, Jeong S, Milovanovic D, Camilli PD & Chang S (2021) Cooperative function of synaptophysin and synapsin in the generation of synaptic vesicle-like clusters in non-neuronal cells. Nat Commun 12

      Picher MM, Gehrt A, Meese S, Ivanovic A, Predoehl F, Jung S, Schrauwen I, Dragonetti AG, Colombo R, Camp GV, et al (2017) Ca2+-binding protein 2 inhibits Ca2+-channel inactivation in mouse inner hair cells. PNAS 114: E1717–E1726

      Robertis ED & Franchi CM (1956) Electron Microscope Observations on Synaptic Vesicles in Synapses of the Retinal Rods and Cones. J Biophys Biochem Cytol 2: 307–318

      Roberts WM, Jacobs RA & Hudspeth AJ (1990) Colocalization of ion channels involved in frequency selectivity and synaptic transmission at presynaptic active zones of hair cells. J Neurosci 10: 3664–3684

      Smith CA & Sjöstrand FS (1961) A synaptic structure in the hair cells of the guinea pig cochlea. Journal of Ultrastructure Research 5: 184–192

      Wong AB, Rutherford MA, Gabrielaitis M, Pangršič T, Göttfert F, Frank T, Michanski S, Hell S, Wolf F, Wichmann C, et al (2014) Developmental refinement of hair cell synapses tightens the coupling of Ca2+ influx to exocytosis. EMBO J 33: 247–264

      Zampini V, Johnson SL, Franz C, Lawrence ND, Münkner S, Engel J, Knipper M, Magistretti J, Masetto S & Marcotti W (2010) Elementary properties of CaV1.3 Ca(2+) channels expressed in mouse cochlear inner hair cells. J Physiol 588: 187–199

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported that their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channel's interactions with MSPs might alter its structure, possibly not accurately representing/reflecting functional states of the channel.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is a promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane mimetics that could alter channel structure. Comparing liposomal ELIC to structures in various-sized nanodiscs gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      The overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. Including comparisons of the results to the native bacterial lipid environment would provide a more encompassing discussion of how the determined liposome structures might or might not relate to the native receptor in its native environment. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, no functional studies were performed to validate this statement.

      The goal of this study was to determine structures of ELIC in the same lipid environment in which its function is characterized. However, it is also worth noting that phosphatidylethanolamine and phosphatidylglyerol, two lipids used for the liposome formation, are necessary for ELIC function (PMID 36385237) and principal lipid components of gram-negative bacterial membranes in which ELIC is expressed.

      The desensitized structure of ELIC in liposomes shows a pore diameter at the hydrophobic L240 (9’) residue of 3.3 Å, which is anticipated to pose a large energetic barrier to the passage of ions due to the hydrophobic effect. We have included a graphical representation of pore diameters from the HOLE analysis for all liposome structures in Supplementary Figure 6B. While we have not tested the role of L240 in desensitization with functional experiments, it was shown by Gonzalez-Gutierrez and colleagues (PMID 22474383) that the L240A mutation apparently eliminates desensitization in ELIC. This finding is consistent with L240 (9’) being the desensitization gate of ELIC. We have referenced this study when discussing the desensitization gate in the Results.

      Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitized, and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether, these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.

      The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weaknesses

      Core parts of the method are not described and/or discussed in enough detail. While I do believe that liposomes will be, in most cases, better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure, and should be an integral part of the report. Therefore, I strongly felt that biochemistry should be better described and discussed. The results section starts with "Optimal reconstitution of ELIC in liposomes [...] was achieved by dialysis". There is no information on why dialysis is optimal, what it was compared to, the distribution of liposome sizes using different preparation techniques, etc... Reading the title, I would have expected a couple of paragraphs and figure panels on liposome reconstitution. Similarly, potential biochemical challenges are not discussed. The methods section mentions that the sample was "dialyzed [...] over 5-7 days". In such a time window, most of the members of this protein family would aggregate, and it is therefore a protocol that can not be directly generalised. This has to be mentioned explicitly, and a discussion on why this can't be done in two days, what else the authors tested (biobeads? ... ?) would strengthen the manuscript.

      To a lesser extent, the relative lack of both technical details and of a broad discussion also pertains to the cryoEM and thallium flux results. Regarding the cryoEM part, the authors focus their analysis on reconstructions from outward-facing particles on the basis of their better resolutions, yet there was little discussion about it. Is it common for liposome-based structures? Are inward-facing reconstructions worse because of the increased background due to electrons going through two membranes? Are there often impurities inside the liposomes (we see some in the figures)? The influence of the membrane mimetics on conformation could be discussed by referring to other families of proteins where it has been explored (for instance, ABC transporters, but I'm sure there are many other examples). If there are studies in other families of channels in liposomes that were inspirational, those could be mentioned. Regarding thallium flux assays, one argument is that they give access to kinetics and set the stage for time-resolved cryoEM, but if I did not miss it, no comparison of kinetics with other techniques, such as electrophysiology, nor references to eventual pioneer time-resolved studies are provided.

      Altogether, in my view, an updated version would benefit from insisting on every aspect of the methodological development. I may well be wrong, but I see this paper more like a milestone on sample prep for cryoEM imaging than being about the details of the ELIC conformations.

      Additions have been made to the Results and Discussion sections elaborating on the following points: 1) reconstitution of ELIC in liposomes using dialysis, the advantage of this over other methods such as biobeads, and whether the dialysis protocol can be shortened for other less stable proteins; 2) the issue of separating outward- and inward-facing channels; 3) referencing the effect of nanodiscs on ABC transporters, structures of membrane proteins in liposomes, and pioneering time-resolved cryo-EM studies; and 4) comparison of the kinetics of ELIC gating kinetics with electrophysiology measurements. With regards to the first point, it should be noted that all necessary details are provided in the Methods to reproduce the experiments including the reconstitution and stopped-flow thallium flux assay. It is also important to note that the same preparation for making proteoliposomes was used for assessing function using the stopped-flow thallium flux assay and for determining the structure by cryo-EM. This is now stated in the Results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major revisions:

      (1) The authors suggest that the desensitization gate is located at the 9' region within the pore. However, as stated by the authors, the 2' residues function as the desensitization gate in related channels. In a few of their HOLE analyzed structures (e.g. Figure 2B and 4B), there seems to be a constriction also at 2', but this finding is not discussed in the context of desensitization. Further functional testing of mutated 9' and/or 2' gates would bolster the argument for the location of the desensitization gate.

      As stated above, we have included HOLE plots of pore radius in Supplementary Fig. 6B and referenced the study showing that the L240A mutation (9’) in ELIC (PMID 22474383) appears to eliminate desensitization. This result along with the narrow pore diameter at 9’ in the desensitized structure suggests that 9’ is likely a desensitization gate in ELIC. In contrast, mutation of Q233 (2’) to a cysteine in a previous study produced a channel that still desensitizes (PMID 25960405). Since Q233 is a hydrophilic residue in contrast to L240, Q233 probably does not pose the same energetic barrier to ion translocation as L240 based on the structure.

      (2) In discussing functional states of ELIC and ELIC5 in different reconstitution methods, the authors reference constriction sites determined by HOLE analysis software. These constriction sites were key evidence for the authors to determine functional state, however, it is difficult to discern pore sizes based on the figures. Pore diameters and clear color designation (ie, green vs orange) with the figures would greatly aid their discussions.

      HOLE plots are displayed in Supplementary Fig. 6B and pore diameters are not provided in the text.

      (3) The authors had an intriguing finding that ELIC dimers are found in spNW25 scaffolds. Is there any functional evidence to suggest they could be functioning as dimers?

      There is no evidence that the function of ELIC or other pLGICs is altered by the formation of dimers of pentamers. Therefore, while this result is intriguing and likely facilitated by concentrating multiple ELIC pentamers within the nanodisc, it is not clear if these interactions have any functional importance. We have stated this in the Results.

      (4) Thallium flux assay to validate channel function within proteoliposomes. Proteoliposomes are known to be generally very leaky membranes, would be good to have controls without ELIC added to determine baseline changes in fluorescence.

      We have established from multiple previous studies that liposomes composed of 2:1:1 POPC:POPE:POPG (PMID 36385237 and 31724949) do not show significant thallium flux as measured by the stopped-flow assay (PMID 29058195) in the absence of ELIC activity. Furthermore, in the present study, the data in Fig. 1A of WT ELIC shows a low thallium flux rate 60 seconds after exposure to agonist when the ion channel has mostly desensitized. Therefore, this data serves also as a control indicating that the high thallium flux rates in response to agonist (at earlier delay times) are not due to leak, but rather due to ELIC channel activity.

      Minor revisions:

      (1) Abstract and introduction. 'Liganded' should be ligand

      We removed this word and changed it to “agonist-bound” for consistency throughout the manuscript.

      (2) Inconsistent formatting of FSC graphs in Supplemental Figure 4

      The difference is a consequence of the different formatting between cryoSPARC and Relion FSC graphs.

      Reviewer #2 (Recommendations for the authors):

      Minor writing remarks:

      The present report builds on previous work from the same team, and to my eye it would be a plus if this were conveyed more explicitly. I see it as a strength to explore various developments in several papers that complement each other. E.g in the introduction when citing reference 12 (Dalal 2024), later in introducing ref 15 (Petroff 2022), I wish I was reminded of the main findings and how they fit with the new results.

      We have expanded on the Results and Discussion detailing key findings from these studies that are relevant to the current study.

      Suggestions for analysis:

      Data treatment. Maybe I missed it, but I wondered if C1 vs C5 treatment of the liposome data showed any interesting differences? When I think about the biological membrane, I picture it as a very crowded place with lots of neighbouring proteins. I would not be surprised if, similarly to what they do in discs, the receptor would tend to stick to, or bump into, anything present also in liposomes (a neighboring liposome, some undefined density inside the liposome).

      We attempted to perform C1 heterogeneous refinement jobs in cryoSPARC and C1 3D classification in Relion5. For the WT datasets, these did not produce 3D reconstructions that were of sufficient quality for further refinement. For ELIC5 with agonist, the C1 reconstructions were not different than the C5 reconstructions. Furthermore, there was no evidence of dimers of pentamers from the 2D or 3D treatments, unlike what was observed in the spNW25 nanodiscs. This is likely because the density of ELIC pentamers in the liposomes was too low to capture these transient interactions. We have included this information in the Methods.

      In data treatment, we sometimes find only what we're looking for. I wondered if the authors tried to find, for instance, the open and D conformations in the resting dataset during classifications.

      This is an interesting question since some population of ELIC channels could visit a desensitized conformation in the absence of agonist and this would not be detected in our flux assay. After extensive heterogeneous refinement jobs in cryoSPARC and 3D classification jobs in Relion5, we did not detect any unexpected structures such as open/desensitized conformations in the apo dataset.

      In the analysis of the M4 motions, is there info to be gained by looking at how it interacts with the rest of the TMD? For instance, I wondered if the buried surface area between M4 and the rest was changed. Also one could imagine to look at that M4 separately in outward-facing and inward-facing conformations (because the tension due to the bilayer will not be the same in the outer layer in both orientations - intuitively, I'd expect different levels of M4 motions)

      We have expanded our analysis of the structures as recommended. We determined the buried surface area between M4 and the rest of the channel in the liganded WT and ELIC5 structures in liposomes and nanodiscs, as well as the area between the TMD interfaces for these structures. There appears to be a pattern where liposome structures show less buried surface area between M4 and the rest of the channel, and less area at the TMD interfaces. Overall, this suggests that the liposome structures of ELIC in the open-channel or desensitized conformations are more loosely packed in the TMD compared to the nanodisc structures.

      We have also further discussed the issue of separating outward- and inward-facing conformations in the Results. The problem with classifying outward- and inward-facing orientations is that top/down or tilted views of the particles cannot be easily distinguished as coming from channels in one orientation or the other, unless there are conformational differences between outward- and inward-facing channels that would allow for their separation during 3D heterogeneous refinement or 3D classification. Furthermore, since the inward-facing reconstructions are of much lower resolution than the outward-facing reconstructions, we suspect that these particles are more heterogeneous possibly containing junk, multiple conformations, or particles that are both inward- and outward-facing. On the other hand, the outward-facing structures are of good quality, and therefore we are more confident that these come from a more homogeneous set of particles that are likely outward-facing (Note that most particles are outward facing based on side views of the 2D class averages). That said, when examining the conformation of M4 in outward- and inward-facing structures, we do not see any significant differences with the caveat that the inward-facing structures are of poor quality and that inward- and outward-facing particles may not have been well-separated.

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of our revisions

      (1) We have explained the reason why the untrained RNN with readout (value-weight) learning only could not well learn the simple task: it is because we trained the models continuously across trials with random inter-trial intervals rather than separately for each episodic trial and so it was not trivial for the models to recognize that cue presentation in different trials constitutes a same single state since the activities of untrained RNN upon cue presentation should differ from trial to trial (Line 177-185).

      (2) We have shown that dimensionality was higher in the value-RNNs than in the untrained RNN (Fig. 2K,6H).

      (3) We have shown that even when distractor cue was introduced, the value-RNNs could learn the task (Fig. 10).

      (4) We have shown that extended value-RNNs incorporating excitatory and inhibitory units and conforming to the Dale's law could still learn the tasks (Fig. 9,10-right column).

      (5) In the original manuscript, the non-negatively constrained value-RNN showed loose alignment of value-weight and random feedback from the beginning but did not show further alignment over trials. We have clarified its reason and found a way, introducing a slight decay (forgetting), to make further alignment occur (Fig. 8E,F).

      (6) We have shown that the value-RNNs could learn the tasks with longer cue-reward delay (Fig. 2M,6J) or action selection (Fig. 11), and found cases where random feedback performed worse than symmetric feedback.

      (7) We compared our value-RNNs with e-prop (Bellec et al., 2020, Nat Commun). While e-prop incorporates the effects of changes in RNN weights across distant times through "eligibility trace", our value-RNNs do not. The reason why our models can still learn the tasks with cue-reward delay is considered to be because our models use TD error and TD learning itself, even TD(0) without eligibility trace, is a solution for temporal credit assignment. In fact, TD error-based e-prop was also examined, but for that, result with symmetric feedback, but not with random feedback, was shown (their Fig. 4,5) while for another setup of reward-based e-prop without TD error, result with random feedback was shown (their SuppFig. 5). We have noted these in Line 695-711 (and also partly in Line 96-99).

      (8) In the original manuscript, we emphasized only the spatial locality (random rather than symmetric feedback) of our learning rule. But we have now also emphasized the temporal locality (online learning) as it is also crucial for bio-plausibility and critically different from the original value-RNN with BPTT. We also changed the title.

      (9) We have realized that our estimation of true state values was invalid (as detailed in page 34 of this document). Effects of this error on performance comparisons were small, but we apologize for this error.

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      *please note that we numbered your public review comments and recommendations for the authors as Pub1 and Rec1 etc so that we can refer to them in our replies to other comments.

      Pub1. The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained.

      These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      We have added an explanation of untrained RNN in Line 144-147:

      “As a negative control, we also conducted simulations in which these connections were not updated from initial values, referring to as the case with "untrained (fixed) RNN". Notably, the value weights w (i.e., connection weights from the RNN to the striatal value unit) were still trained in the models with untrained RNN.”

      We have also analyzed the dimensionality of network dynamic by calculating the contribution ratios of each principal component of the trajectory of RNN activities. It was revealed that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN. We have added these results in Fig. 2K and Line 210-220 (for our original models without non-negative constraint):

      “In order to examine the dimensionality of RNN dynamics, we conducted principal component analysis (PCA) of the time series (for 1000 trials) of RNN activities and calculated the contribution ratios of PCs in the cases of oVRNNbp, oVRNNrf, and untrained RNN with 20 RNN units. Figure 2K shows a log of contribution ratios of 20 PCs in each case. Compared with the case of untrained RNN, in oVRNNbp and oVRNNrf, initial component(s) had smaller contributions (PC1 (t-test p = 0.00018 in oVRNNbp; p = 0.0058 in oVRNNrf) and PC2 (p = 0.080 in oVRNNbp; p = 0.0026 in oVRNNrf)) while later components had larger contributions (PC3~10,15~20 p < 0.041 in oVRNNbp; PC5~20 p < 0.0017 in oVRNNrf) on average, and this is considered to underlie their superior learning performance. We noticed that late components had larger contributions in oVRNNrf than in oVRNNbp, although these two models with 20 RNN units were comparable in terms of cue~reward state values (Fig. 2J-left).”

      and Fig. 6H and Line 412-416 (for our extended models with non-negative constraint):

      “Figure 6H shows contribution ratios of PCs of the time series of RNN activities in each model with 20 RNN units. Compared with the cases with naive/shuffled untrained RNN, in oVRNNbp-rev and oVRNNrf-bio, later components had relatively high contributions (PC5~20 p < 1.4×10,sup>−6</sup> (t-test vs naive) or < 0.014 (vs shuffled) in oVRNNbp-rev; PC6~20 p < 2.0×10<sup>−7</sup> (vs naive) or PC7~20 p < 5.9×10<sup>−14</sup> (vs shuffled) in oVRNNrf-bio), explaining their superior value-learning performance.”

      Regarding the poor performance of the model with untrained RNN, we would like to add a note. It is sure that untrained RNN with sufficient dimensions should be able to well represent just <10 different states, and state values should be able to be well learned through TD learning regardless of whatever representation is used. However, a difficulty (nontriviality) lies in that because we modeled the tasks in a continuous way, rather than in an episodic way, the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using backprop-through-time (BPTT) for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      Pub2. The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      In the revised manuscript, we examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps. Our online value RNN models with random feedback could still achieve better performance (smaller squared value error) than the models with untrained RNN, although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      Also, we have added the note about our assumption and consideration on the time-step that we described in our provisional reply in Line 136-142:

      “We assumed that a single RNN unit corresponds to a small population of neurons that intrinsically share inputs and outputs, for genetic or developmental reasons, and the activity of each unit represents the (relative) firing rate of the population. Cortical population activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics [46] such as short-term facilitation, whose time constant can be around 500 milliseconds [47]. Therefore, we assumed that single time-step of our rate-based (rather than spike-based) model corresponds to 500 milliseconds.”

      Pub3. In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      We examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units” and described the details of the extended models in Line 844-862:

      Pub4. Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      We examined the performance of the models in a task in which distractor cue randomly appeared. As a result, our model with random feedback, as well as the model with backprop, could still learn the state values much better than the models with untrained RNN. We have added these results in Fig. 10 and subsection “4.2 Task with distractor cue”

      Reviewer #1 (Recommendations for the authors):

      Detailed comments to authors

      Rec1. Are the untrained RNNs discussed in methods? It seems quite good in estimating value but has a strong dopamine response at time of reward. Is nothing trained in the untrained RNN or are the W values trained. Untrained RNN are not bad at estimating value, but not as good as the two other options. It would seem reasonable that an untrained RNN (if I understand what it is) will be sufficient for such simple Pavlovian conditioning paradigms. This is provided that the RNN generates a complete, or nearly complete basis. Random RNN's provided that the random weights are chosen properly can indeed generate a nearly complete basis. Once there is a nearly complete temporal basis, it seems that a powerful enough learning rule will be able to learn the very simple Pavlovian conditioning. Since there are only 3 time-steps from cue to reward, an RNN dimensionality of 3 would be sufficient. A failure to get a good approximation can also arise from the failure of the learning algorithm for the output weights (W).

      As we mentioned in our reply to your public comment Pub1 (page 3-5), we have added an explanation of "untrained RNN" (in which the value weights were still learnt) (Line 144-147). We also analyzed the dimensionality of network dynamics by calculating the contribution ratios of principal components of the trajectory of RNN activities, showing that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN (Fig. 2K/Line 210-220, Fig.6H/Line 412-416). Moreover, also as we mentioned in our reply to your public comment Pub1, we have added a note that even learning of a small number of states was not trivially easy because we considered continuous learning across trials rather than episodic learning of separate trials and thus it was not trivial for the model to know that cue presentation in different trials after random lengths of inter-trial interval should still be regarded as a same single state (Line 177-185).

      Rec2. For all cases, it will be useful to estimate the dimensionality of the RNN. Is the dimensionality of the untrained RNN smaller than in the trained cases? If this is the case, this might depend on the choice of the initial random (I assume) recurrent connectivity matrix.

      As mentioned above, we have analyzed the dimensionality of the network dynamics, and as you said, the dimensionality of the model with untrained RNN (which was indeed the initial random matrix as you said, as we mentioned above) was on average smaller than the trained value RNN models (Fig. 2K/Line 210-220, Fig.6H/Line 412-416).

      Rec3. It is surprising that the error starts increasing for more RNN units above ~15. See discussion. This might indicate a failure to adjust the learning parameters of the network rather than a true and interesting finding.

      Thank you very much for this insightful comment. In the original manuscript, we set the learning rate to a fixed value (0.1), without normalization by the squared norm of feature vector (as we mentioned in Line 656-7 of the original manuscript) because we thought such a normalization could not be locally (biologically) implemented. However, we have realized that the lack of normalization resulted in excessively large learning rate when the number of RNN units was large and it could cause instability and error increase as you suggested. Therefore, in the revised manuscript, we have implemented a normalization of learning rate (of value weights) that does not require non-local computations, specifically, division by the number of RNN units. As a result, the error now monotonically decreased, as the number of RNN units increased, in the non-negatively constrained models (Fig. 6E-left) and also largely in the unconstrained model with random feedback, although still not in the unconstrained model with backprop or untrained RNN (Fig. 2J-left)

      Rec4. Not numbering equations is a problem. For example, the explanations of feedback alignment (lines 194-206) rely on equations in the methods section which are not numbered. This makes it hard to read these explanations. Indeed, it will also be better to include a detailed derivation of the explanation in these lines in a mathematical appendix. Key equations should be numbered.

      We have added numbers to key equations in the Methods, and references to the numbers of corresponding equations in the main text. Detailed derivations are included in the Methods.

      Rec5. What is shown in Figure 3C? - an equation will help.

      We have added an explanation using equations in the main text (Line 256-259).

      Rec6. The explanation of why alignment occurs is not satisfactory, but neither is it in previous work on feedforward networks. The least that should be done though

      Regarding why alignment occurs, what remained mysterious (to us) was that in the case of nonnegatively constrained model, while the angle between value weight vector (w) and the random feedback vector (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials, despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added these in the revised manuscript (Line 463-477):

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Rec7. I don't understand the qualitative difference between 4G and 4H. The difference seems to be smaller but there is still an apparent difference. Can this be quantified?

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      Rec8. More biologically realistic constraints.

      Are the weights allowed to become negative? - No.

      Figure 6C - untrained RNN with non-negative x_i. Again - it was not explained what untrained RNN is. However, given my previous assumption, this is probably because the units developed in an untrained RNN is much further from representing a complete basis function. This cannot be done with only positive values. It would be useful to see network dynamics of units for untrained RNN. It might also be useful in all cases to estimate the dimensionality of the RNN. For 3 time-steps, it needs to be at least 3, and for more time steps as in Figure 4, larger.

      As we mentioned in our reply to your public comment Pub3 (page 6-8), in the revised manuscript we examined models that incorporated inhibitory and excitatory units and followed Dale's law, which could still learn the tasks (Fig. 9, Line 479-520). We have also analyzed the dimensionality of network dynamics as we mentioned in our replies to your public comment Pub1 and recommendations Rec1 and Rec2.

      Rec9. A new type of untrained RNN is introduced (Fig 6D) this is the first time an explanation of of the untrained RNN is given. Indeed, the dimensionality of the second type of untrained RNN should be similar to the bioVRNNrf. The results are still not good.

      In the model with the new type of untrained RNN whose elements were shuffled from trained bioVRNNrf, contribution ratios of later principal components of the trajectory of RNN activities (Fig. 6H gray dotted line) were indeed larger than those in the model with native untrained RNN (gray solid line) but still much smaller than those in the trained value RNN models with backprop (red line) or random feedback (blue line). It is considered that in value RNN, RNN connections were trained to realize high-dimensional trajectory, and shuffling did not generally preserve such an ability.

      Rec10. The discussion is too long and verbose. This is not a review paper.

      We have made the original discussion much more compact (from 1686 words to 940 words). We have added new discussion, in response to the review comments, but the total length remains to be shorter than before (1589 words).

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain nonnegative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      We have examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units”.

      We have also examined the performance of the models in a task in which distractor cue randomly appeared, finding that our models could still learn the state values much better than the models with untrained RNN. We have added these result in Fig. 10 and subsection “4.2 Task with distractor cue”.

      Regarding the depth, we continue to think about it but have not yet come up with concrete ideas.

      Reviewer #2 (Recommendations for the authors):

      (1) I think the work would greatly benefit from more proofreading. There are language errors/oddities throughout the paper, I will list just a few examples from the introduction:

      Thank you for pointing this out. We have made revisions throughout the paper.

      line 63: "simultaneously learnt in the downstream of RNN". Simultaneously learnt in networks downstream of the RNN? Simulatenously learn in a downstream RNN? The meaning is not clear in the original sentence.

      We have revised it to "simultaneously learnt in connections downstream of the RNN" (Line 67-68).

      starting in line 65: " A major problem, among others.... value-encoding unit" is a run-on sentence and would more readable if split into multiple sentences.

      We have extensively revised this part, which now consists of short sentences (Line 70-75).

      line 77: "in supervised learning of feed-forward network" should be either "in supervised learning of a feed-forward network" or "in supervised learning of feed-forward networks".

      We have changed "feed-forward network" to "feed-forward networks" (Line 83).

      (2) Under what conditions can you use an online learning rule which only considers the influence of the previous timestep? It's not clear to me how your networks solve the temporal credit assignment problem when the cue-reward delay in your tasks is 3-5ish time steps. How far can you stretch this delay before your networks stop learning correctly because of this one-step assumption? Further, how much does feedback alignment constrain your ability to learn long timescales, such as in Murray, J.M. (2019)?

      The reason why our models can solve the temporal credit assignment problem at least to a certain extent is considered to be because temporal-difference (TD) learning, which we adopted, itself has a power to resolve temporal credit assignment, as exemplified in that TD(0) algorithms without eligibility trance can still learn the value of distant rewards. We have added a discussion on this in Line 702-705:

      “…our models do not have "eligibility trace" (nor memorable/gated unit, different from the original value-RNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]).”

      We have also examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps, and our models with random feedback could still achieve better performance than the models with untrained RNN although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      As for the difficulty due to random feedback compared to backprop, there appeared to be little difference in the models without non-negative constraint (Fig. 2M), whereas in the models with nonnegative constraint, when the cue-reward delay was elongated to 6 time-steps, the model with random feedback performed worse than the model with backprop (Fig. 6J bottom-left panel).

      (3) Line 150: Were the RNN methods trained with continuation between trials?

      Yes, we have added

      “The oVRNN models, and the model with untrained RNN, were continuously trained across trials in each task, because we considered that it was ecologically more plausible than episodic training of separate trials.” in Line 147-150. This is considered to make learning of even the simple cue-reward association task nontrivial, as we describe in our reply to your comment 9 below.

      (4) Figure 2I, J: indicate the statistical significance of the difference between the three methods for each of these measures.

      We have added statistical information for Fig. 2J (Line 198-203):

      “As shown in the left panel of Fig. 2J, on average across simulations, oVRNNbp and oVRNNrf exhibited largely comparable performance and always outperformed the untrained RNN (p < 0.00022 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units), although oVRNNbp somewhat outperformed or underperformed oVRNNrf when the number of RNN units was small (≤10 (p < 0.049)) or large (≥25 (p < 0.045)), respectively.”

      and also Fig. 6E (for non-negative models) (Line 385-390):

      “As shown in the left panel of Fig. 6E, oVRNNbp-rev and oVRNNrf-bio exhibited largely comparable performance and always outperformed the models with untrained RNN (p < 2.5×10<sup>−12</sup> in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units), although oVRNNbp-rev somewhat outperformed or underperformed oVRNNrf-bio when the number of RNN units was small (≤10 (p < 0.00029)) or large (≥25 (p < 3.7×10<sup>−6</sup>)), respectively…”

      Fig. 2I shows distributions, whose means are plotted in Fig. 2J, and we did not add statistics to Fig. 2I itself.

      (5) Line 178: Has learning reached a steady state after 1000 trials for each of these networks? Can you show a plot of error vs. trial number?

      We have added a plot of error vs trial number for original models (Fig. 2L, Line 221-223):

      “We examined how learning proceeded across trials in the models with 20 RNN units. As shown in Fig. 2L, learning became largely converged by 1000-th trial, although slight improvement continued afterward.”

      and non-negatively constrained models (Fig. 6I, Line 417-422):

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      As shown in these figures, learning became largely steady at 1000 trials, but still slightly continued, and we have added simulations with 3000 trials (Fig. 2M and Fig. 6J).

      (6) Line 191: Put these regression values in the figure caption, as well as on the plot in Figure 3B.

      We have added the regression values in Fig. 3B and its caption.

      (7) Line 199: This idea of being in the same quadrant is interesting, but I think the term "relatively close angle" is too vague. Is there another more quantatative way to describe this what you mean by this?

      We have revised this (Line 252-254) to “a vector that is in a relatively close angle with c , or more specifically, is in the same quadrant as (and thus within at maximum 90° from) c (for example, [c<sub>1</sub>  c<sub>2</sub>  c<sub>3</sub>]<sup>T</sup> and [0.5c<sub>1</sub> 1.2c<sub>2</sub> 0.8c<sub>3</sub>]T) “

      (8) Line 275: I'd like to see this measure directly in a plot, along with the statistical significance.

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      (9) Line 280: Surely the untrained RNN should be able to solve the task if the reservoir is big enough, no? Maybe much bigger than 50 units, but still.

      We think this is not sure. A difficulty lies in that because we modeled the tasks in a continuous way rather than in an episodic way (as we mentioned in our reply to your comment 3), the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using BPTT for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      (10) It's a bit confusing to compare Figure 4C to Figure 4D-H because there are also many features of D-H which do not match those of C (response to cue, response to late reward in task 1). It would make sense to address this in some way. Is there another way to calculate the true values of the states (e.g., maybe you only start from the time of the cue) which better approximates what the networks are doing?

      As we mentioned in our replies to your comments 3 and 9, our models with RNN were trained continuously across trials rather than separately for each episodic trial, and whether the models could still learn the state representation is a key issue. Therefore, starting learning from the time of cue would not be an appropriate way to compare the models, and instead we have made statistical comparison regarding key features, specifically, TD-RPEs at early and late rewards, as indicated in Fig. 4D-H.

      (11) Line 309: Can you explain why this non-monotic feature exists? Why do you believe it would be more biologically plausible to assume monotonic dependence? It doesn't seem so straightforward to me, I can imagine that competing LTP/LTD mechanisms may produce plasticity which would have a non-monotic dependence on post-synaptic activity.

      Thank you for this insightful comment. As you suggested, non-monotonic dependence on the postsynaptic activity (BCM rule) has been proposed for unsupervised learning (cortical self-organization) (Bienenstock et al., 1982 J Neurosci), and there were suggestions that triplet-based STDP could be reduced to a BCM-like rule and additional components (Gjorgjieva et al., 2011 PNAS; Shouval, 2011 PNAS). However, the non-monotonicity appeared in our model, derived from the backprop rule, is maximized at the middle and thus opposite from the BCM rule, which is minimized at the middle (i.e., initially decrease and thereafter increase). Therefore we consider that such an increase-then-decreasetype non-monotonicity would be less plausible than a monotonic increase, which could approximate an extreme case (with a minimum dip) of the BCM rule. We have added a note on this point in Line 355-358:

      “…the dependence on the post-synaptic activity was non-monotonic, maximized at the middle of the range of activity. It would be more biologically plausible to assume a monotonic increase (while an opposite shape of nonmonotonicity, once decrease and thereafter increase, called the BCM (Bienenstock-Cooper-Munro) rule has actually been suggested [56-58]).”

      (12) Line 363: This is the most exciting part of the paper (for me). I want to learn way more about this! Don't hide this in a few sentences. I want to know all about loose vs. feedback alignment. Show visualizations in 3D space of the idea of loose alignment (starting in the same quadrant), and compare it to how feedback alignment develops (ending in the same quadrant). Does this "loose" alignment idea give us an idea why the random feedback seems to settle at 45 degree angle? it just needs to get the signs right (same quadrant) for each element?

      In reply to this encouraging comment, we have made further analyses of the loose alignment. By the term "loose alignment", we meant that the value weight vector w and the feedback vector c are in the same (non-negative) quadrant, as you said. But what remained mysterious (to us) was while the angle between w and c was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the nonnegative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      As for visualization, because the model's dimension was high such as 12, we could not come up with better ways of visualization than the trial versus angle plot (Fig. 3A, 8A,F). Nevertheless, we would expect that the abovementioned additional analyses of loose alignment (with graphs) are useful to understand what are going on.

      (13) Line 426: how does this compare to some of the reward modulated hebbian rules proposed in other RNNs? See Hoerzer, G. M., Legenstein, R., & Maass, W. (2014). Put another way, you arrived at this from a top-down approach (gradient descent->BP->approximated by RF->non-negativity constraint>leads to DA dependent modulation of Hebbian plasticity). How might this compare to a bottom up approach (i.e. starting from the principle of Hebbian learning, and adding in reward modulation)

      The study of Hoerzer et al. 2014 used a stochastic perturbation, which we did not assume but can potentially be integrated. On the other hand, Hoerzer et al. trained the readout of untrained RNN, whereas we trained both RNN and its readout. We have added discussion to compare our model with Hoerzer et al. and other works that also used perturbation methods, as well as other top-down approximation method, in Line 685-711 (reference 128 is Hoerzer et al. 2014 Cereb Cortex):

      “As an alternative to backprop in hierarchical network, aside from feedback alignment [36], Associative Reward-Penalty (A<sub>R-P</sub>) algorithm has been proposed [124-126]. In A<sub>R-P</sub>, the hidden units behave stochastically, allowing the gradient to be estimated via stochastic sampling. Recent work [127] has proposed Phaseless Alignment Learning (PAL), in which high-frequency noise-induced learning of feedback projections proceeds simultaneously with learning of forward projections using the feedback in a lower frequency. Noise-induced learning of the weights on readout neurons from untrained RNN by reward-modulated Hebbian plasticity has also been demonstrated [128]. Such noise- or perturbation-based [40] mechanisms are biologically plausible because neurons and neural networks can exhibit noisy or chaotic behavior [129-131], and might improve the performance of value-RNN if implemented.

      Regarding learning of RNN, "e-prop" [35] was proposed as a locally learnable online approximation of BPTT [27], which was used in the original value RNN 26. In e-prop, neuron-specific learning signal is combined with weight-specific locally-updatable "eligibility trace". Reward-based e-prop was also shown to work [35], both in a setup not introducing TD-RPE with symmetric or random feedback (their Supplementary Figure 5) and in another setup introducing TD-RPE with symmetric feedback (their Figure 4 and 5). Compared to these, our models differ in multiple ways.

      First, we have shown that alignment to random feedback occurs in the models driven by TD-RPE. Second, our models do not have "eligibility trace" (nor memorable/gated unit, different from the original valueRNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]). However, as mentioned before, single time-step in our models was assumed to correspond to hundreds of milliseconds, incorporating slow synaptic dynamics, whereas e-prop is an algorithm for spiking neuron models with a much finer time scale. From this aspect, our models could be seen as a coarsetime-scale approximation of e-prop. On top of these, our results point to a potential computational benefit of biological non-negative constraint, which could effectively limit the parameter space and promote learning.”

      Related to your latter point (and also replying to other reviewer's comment), we also examined the cases where the random feedback in our model was replaced with uniform feedback, which corresponds to a simple bottom-up reward-modulated triplet plasticity rule. As a result, the model with uniform feedback showed largely comparable, but somewhat worse, performance than the model with random feedback. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1)<sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN. and also added a biological implication of the results in Line 644-652:

      We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (postsynaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      We have examined the cases where the feedback was uniform, i.e., in the direction of (1, 1, ..., 1) in both models without and with non-negative constraint. In both models, the models with uniform feedback performed somewhat worse than the original models with random feedback, but still better than the models with untrained RNN. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN.”

      We have also added a discussion on the biological implication of the model with uniform feedback mentioned in our provisional reply in Line 644-652:

      “We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      In addition, while preparing the revised manuscript, we found a recent simulation study, which showed that uniform feedback coupled with positive forward weights was effective in supervised learning of one-dimensional output in feed-forward network (Konishi et al., 2023, Front Neurosci).

      We have briefly discussed this work in Line 653-655:

      “Notably, uniform feedback coupled with positive forward weights was shown to be effective also in supervised learning of one-dimensional output in feed-forward network [114], and we guess that loose alignment may underlie it.”

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      We have added a discussion on the prediction of our models, mentioned in our provisional reply, in Line 627-638:

      “oVRNNrf predicts that the feedback vector c and the value-weight vector w become gradually aligned, while oVRNNrf-bio predicts that c and w are loosely aligned from the beginning. Element of c could be measured as the magnitude of pyramidal cell's response to DA stimulation. Element of w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell. Then, the abovementioned predictions could be tested by (i) identify cortical, striatal, and VTA regions that are connected, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether DA→pyramidal responses and pyramidal→striatal responses are associated across pyramidal cells, and whether such associations develop through learning.”

      Moreover, we have considered another (technically more doable) prediction of our model, and described it in Line 639-643:

      “Testing this prediction, however, would be technically quite demanding, as mentioned above. An alternative way of testing our model is to manipulate the cortical DA feedback and see if it will cause (re-)alignment of value weights (i.e., cortical striatal strengths). Specifically, our model predicts that if DA projection to a particular cortical locus is silenced, effect of the activity of that locus on the value-encoding striatal activity will become diminished.”

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task? [1] https://www.nature.com/articles/s41467-020-17236-y

      As for a specific feature of non-negative models, we did not describe (actually did not well recognize) an intriguing result that the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left (please mind the difference in the vertical scales)). This suggests that the non-negative constraint effectively limited the parameter space and thereby learning became efficient. We have added this result in Line 392-395:

      “Remarkably, oVRNNrf-bio generally achieved better performance than both oVRNNbp and oVRNNrf, which did not have the non-negative constraint (Wilcoxon rank sum test, vs oVRNNbp : p < 7.8×10,sup>−6</sup> for 5 or ≥25 RNN units; vs oVRNNrf: p < 0.021 for ≤10 or ≥20 RNN units).”

      Also, in the models with non-negative constraint, the model with random feedback learned more rapidly than the model with backprop although they eventually reached a comparable level of errors, at least in the case with 20 RNN units. This is presumably because the value weights did not develop well in early trials and so the backprop-based feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning. We have added this result in Fig. 6I and Line 417-422:

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      We have also added a discussion on how our model can be positioned in relation to other models including the study you mentioned (e-prop by Bellec, ..., Maass, 2020) in subsection “Comparison to other algorithms” of the Discussion):

      Regarding the slightly better performance of the non-negative model with random feedback than that of the non-negative model with backprop when the number of RNN units was large (mentioned in our provisional reply), state values in the backprop model appeared underdeveloped than those in the random feedback model. Slightly better performance of random feedback than backprop held also in our extended model incorporating excitatory and inhibitory units (Fig. 9B).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In the cue-reward association task with 3 time-steps delay, the non-negative model with random feedback performed largely comparably to the non-negative model with backprop, and this remained to hold in a task where distractor cue, which was not associated with reward, appeared in random timings. We have added the results in Fig. 10 and subsection “4.2 Task with distractor cue”.

      We have also examined the cases where the cue-reward delay was elongated. In the case of longer cue-reward delay (6 time-steps), in the models without non-negative constraint, the model with random feedback performed comparably to (and slightly better than when the number of RNN units was large) the model with backprop (Fig. 2M). In contrast, in the models with non-negative constraint, the model with random feedback underperformed the model with backprop (Fig. 6J, left-bottom). This indicates a difference between the effect of non-negative random feedback and the effect of positive+negative random feedback.

      We have further examined the performance of the models in terms of action selection, by extending the models to incorporate an actor-critic algorithm. In a task with inter-temporal choice (i.e., immediate small reward vs delayed large reward), the non-negative model with random feedback performed worse than the non-negative model with backprop when the number of RNN units was small. When the number of RNN increased, these models performed more comparably. These results are described in Fig. 11 and subsection “4.3 Incorporation of action selection”.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      As for 7a), 'CSC (complete serial compound)' was actually not the name of the task but the name of the 'punctate' state representation, in which each state (timing from cue) is represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), ..., and (0, 0, ..., 1). As you pointed out, using the name of 'CSC' would make the text appearing more technical than it actually is, and so we have moved the reference to the name of 'CSC' to the Methods (Line 903-907):

      “For the agents with punctate state representation, which is also referred to as the complete serial compound (CSC) representation [1, 48, 133], each timing from a cue in the tasks was represented by a 10-dimensional one-hot vector, starting from (1 0 0 ... 0)<sup>T</sup> for the cue state, with the next state (0 1 0 ... 0) <sup>T</sup> and so on.”

      and in the Results we have instead added a clearer explanation (Line 163-165):

      “First, for comparison, we examined traditional TD-RL agent with punctate state representation (without using the RNN), in which each state (time-step from a cue) was represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), and so on.”

      As for 7b), we have added the rationale for our examination of the tasks with probabilistic structures (Line 282-294):

      “Previous work [54] examined the response of DA neurons in cue-reward association tasks in which reward timing was probabilistically determined (early in some trials but late in other trials). There were two tasks, which were largely similar but there was a key difference that reward was given in all the trials in one task whereas reward was omitted in some randomly determined trials in another task. Starkweather et al. [54] found that the DA response to later reward was smaller than the response to earlier reward in the former task, presumably reflecting the animal's belief that delayed reward will surely come, but the opposite was the case in the latter task, presumably because the animal suspected that reward was omitted in that trial. Starkweather et al.[54] then showed that such response patterns could be explained if DA encoded TD-RPE under particular state representations that incorporated the probabilistic structures of the task (called the 'belief state'). In that study, such state representations were 'handcrafted' by the authors, but the subsequent work [26] showed that the original value-RNN with backprop (BPTT) could develop similar representations and reproduce the experimentally observed DA patterns.”

      As for 7c), we have extensively revised the text of the results, adding high-level explanations while trying to reduce the lengthy low-level descriptions (e.g., Line 172-177 for Fig2E-G).

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      There is actually an unexpected finding with non-negative model: the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left), presumably because the nonnegative constraint effectively limited the parameter space and thereby learning became efficient, as we mentioned in our reply to your point 6a above (we did not well recognize this at the time of original submission).

      Another potential merit of our present work is the simplicity of the model and the task. This simplicity enabled us to derive an intuitive explanation on why feedback alignment could occur. Such an intuitive explanation was lacking in previous studies while more precise mathematical explanations did exist. Related to the mechanism of feedback alignment, one thing remained mysterious to us at the time of original submission. Specifically, in the non-negatively constraint random feedback model, while the angle between the value weight (w) and the random feedback (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Correction of an error in the original manuscript

      In addition to revising the manuscript according to your comments, we have made a correction on the way of estimating the true state values. Specifically, in the original manuscript, we defined states by relative time-steps from a reward and estimated their values by calculating the sums of discounted future rewards starting from them through simulations. However, we assumed variable inter-trial intervals (ITIs) (4, 5, 6, or 7 time-steps with equal probabilities), and so until receiving cue information, agent should not know when the next reward will come. Therefore, states for the timings up to the cue timing cannot be defined by the upcoming reward, but previously we did so (e.g., state of "one timestep before cue") without taking into account the ITI variability.

      We have now corrected this issue, having defined the states of timings with respect to the previous (rather than upcoming) reward. For example, when ITI was 4 time-steps and agent existed in its last time-step, agent will in fact receive a cue at the next time-step, but agent should not know it until actually receiving the cue information and instead should assume that s/he was at the last time-step of ITI (if ITI was 4), last − 1 (if ITI was 5), last − 2 (if ITI was 6), or last − 3 (if ITI was 7) with equal probabilities (in a similar fashion to what we considered when thinking about state definition for the probabilistic tasks). We estimated the true values of states defined in this way through simulations. As a result, the corrected true value of the cue-timing has become slightly smaller than the value described in the original manuscript (reflecting the uncertainty about ITI length), and consequently small positive TD-RPE has now appeared at the cue timing.

      Because we measured the performance of the models by squared errors in state values, this correction affected the results reporting the performance. Fortunately, the effects were relatively minor and did not largely alter the results of performance comparisons. However, we sincerely apologize for this error. In the revised manuscript, we have used the corrected true values throughout the manuscript, and we have described the ways of estimating these values in Line 919-976.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      We agree that context is important! Although we expected to get the most interesting results from studying the classical genes, we did include the non-classical genes specifically for comparison. They are located in the same genomic region, have multiple sequences catalogued in different species (although they are less diverse), and perform critical immune functions. We think this is a more appropriate set to compare with the classical MHC genes than, say, a random set of genes. Interestingly, we did not detect TSP in these non-classical genes. This likely means that the classical MHC genes are truly exceptional, but it could also mean that not enough sequences are available for the non-classical genes to detect TSP. 

      It would be very interesting to repeat this analysis for another gene family to see whether such deep TSP also occurs in other immune or non-immune gene families. We are lucky that decades of past work and a dedicated database exists for cataloging MHC sequences. When this level of sequence collection is achieved for other highly polymorphic gene families, it will be possible to do a comparable analysis.  

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      We were not able to infer TSP rates conditional on rates of gene gain/loss. We agree that some cases of TSP were likely lost due to the loss of a gene paralog from certain species. Furthermore, the dearth of MHC whole-region and allele sequences available for most primates makes it difficult to detect TSP, even if the gene paralog is still present. Long-read sequencing of more primate genomes should help with this. We agree that it would also be very interesting to study TSPs that were maintained for millions of years but were lost recently.

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      We agree that a linear model is likely not the most biologically reasonable choice, as protein interactions are complex. However, we made the choice to implement the simplest model because the evolutionary rates we inferred were relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

      To strengthen this claim, we added Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the eLife template does not allow). Here, we plot the number of associations for each amino acid against evolutionary rate, revealing a significant positive slope in Class I. We also added explanatory text for this figure in lines 400-404.

      Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534).  We also added text (lines 216-219 and 250-252) to more explicitly point out that our method is conservative when few sequences are available.

      We also added a paragraph to the discussion which addresses data quality and mismapping issues (lines 473-499).

      We clarified the role of our companion paper (line 49-50) by changing “In our companion paper, we explored the relationships between the different classical and non-classical genes” to “In our companion paper, we built large multi-gene trees to explore the relationships between the different classical and non-classical genes.” We also changed the text in lines 97-99 from “In our companion paper, we compared genes across dozens of species and learned more about the orthologous relationships among them” to “In our companion paper, we built trees to compare genes across dozens of species. When paired with previous literature, these trees helped us infer orthology and assign sequences to genes in some cases.”

      Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):

      -  Inadequate description and presentation of the data used

      -  Large parts of the results read like extended figure captions, which breaks the flow. - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.

      -  The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

      We address these comments in the more detailed section below.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The abstract could benefit from being sharpened. A personal pet peeve is a common habit of saying we don't know everything about a topic (line 16 - "lack a full picture of primate MHC evolution"); We never know everything on a topic, so this is hardly a strong rationale to do more work on it. This is followed by "to start addressing this gap" - which is vague because you haven't explicitly stated any gap, you simply said we are not yet omniscent on the topic. Please clearly identify a gap in our knowledge, a question that you will be able to answer with this paper.

      That makes sense! We added another sentence to the abstract to make the specific gap clearer. Inserted “In particular, we do not know to what extent genes and alleles are retained across speciation events” in lines 16-17.

      Reviewer #2 (Recommendations for the authors):

      - Some discussion of alternative explanations when certain comparisons were not found to have TSP - is this consistent with genetic drift sometimes leading to lineage loss, or does it suggest that the proposed tradeoff between autoimmunity and pathogen recognition might differ depending on primates' life history and/or exposure to similar pathogens? Could the trade-off of pathogen to self-recognition not be as costly in some species?

      This is consistent with genetic drift, as no lineages are expected to be maintained across these distantly-diverged primates under neutral selection. These ideas are certainly possible, but our Bayes Factor test only reveals evidence (or lack thereof) for deviations from the species tree and cannot provide reasons why or why not.

      - It would be interesting to put these results on very long-term balancing selection in the context of what has been reported at the region for shorter term balancing selection. The discussion compares findings of previous genes in the literature but not regarding the time scale.

      Indeed, there is some evidence for the idea of “divergent allele advantage”, in which MHC-heterozygous individuals have a greater repertoire of peptides that they can present, leading to greater resistance against pathogens and greater fitness. This heterozygote advantage thus leads to balancing selection (Pierini and Lenz, 2018; Chowell et al., 2019). Our discussion mentions other time scales of balancing selection across the primates at the MHC and other loci, but we choose to focus more on long-term than short-term balancing selection.

      - Lines 223-226 - how is the difference in BF across exons in MHC-A to be interpreted? The paragraph is about MHC-A, but then the explanation in the last sentence is for when similar BF are observed which is not the case for MHC-A. Is this interpreted as lack of evidence for TSP? Or something about recombination or gene conversion? Or that one exon may be under balancing selection but not the other?

      Thank you for pointing out the confusing logic in this paragraph. 

      Previous: “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Many sequences had to be excluded from MHC-A comparisons because they were identified as gene-converted in the \textit{GENECONV} analysis or were previously identified as recombinants \citep{Hans2017,Gleimer2011,Adams2001}. Importantly, for MHC-A we do not see concordance in Bayes factors across the different exons, whereas we do for the other gene groups. Similar Bayes factors across all exons for a given comparison is thus evidence in favor of TSP being the primary driver of the observed deep coalescence structure (rather than recombination or gene conversion).” Current (lines 228-238): 

      “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Past work suggests that this gene has had a long history of gene conversion affecting different exons, resulting in different evolutionary histories for different parts of the gene \citep{Hans2017,Gleimer2011,Adams2001}. Indeed, we excluded many MHC-A sequences from our Bayes factor calculations because they were identified as gene-converted in our \textit{GENECONV} analysis or were previously suggested to be recombinants. As shown in \FIG{bayes_factors_classI}, the lack of concordance in Bayes factors across the different exons for MHC-A is evidence for gene conversion, rather than balancing selection, being the most important factor in this gene's evolution. In contrast, the other gene groups generally show concordance in Bayes factors across exons. We interpret this as evidence in favor of TSP being the primary driver of the observed deep coalescence structure for MHC-B and -C (rather than recombination or gene conversion).”

      - In Figures 5C and 6C, the points sometimes show a kind of smile pattern of possibly higher rates further from the peptide. Did authors explore other fits like a polynomial? Or, whether distance only matters in close proximity to the peptide? Out of curiosity, is it possible to map substitution time/branch into the distance to the peptide binding region for each substitution? Is there any pattern with distance to interacting proteins in non-peptide binding MHC proteins like MHC-DOA? Although they don't have a PBR they do interact with other proteins.

      Thank you for these ideas! We did not explore other fits, such as a polynomial, because we wanted to implement the simplest model. Our evolutionary rates are relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      There is most likely a relationship between evolutionary rate and the distance to interacting proteins in the non-peptide-binding molecules MHC-DM and -DO. However, there are few currently available models and it is difficult to determine which residues in these models are actually interacting. However, researchers with more experience in protein interactions would be able to undertake such an analysis. 

      - How biased is the database towards human alleles? Could this affect some of the analyses, including the coincidence of rapidly evolving sites with associations? Are there more associations than expected under some null model?

      While the database is indeed biased toward human alleles, we included only a small subset of these in order to create a more balanced data set spanning the primates. This is unlikely to affect the coincidence of rapidly-evolving sites with associations; however, we note that there are no such association studies meeting our criteria in other species, meaning the associations are only coming from studies on humans.

      - To this reader, it is unnecessary and distracting to describe the figures within the text; there are frequent sentences in the text that belongs in the figure legend instead (e.g., lines 139-143, 208-211, 214-215, 328-330, etc). It would be better to focus on the results from the figures and then cite the figure, where the colors and exactly what is plotted can be in the figure legend.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      - I'm still concerned that the poor mappability of short-read data is contributing in some ways. Were the sequences in the database mostly from long-reads? Was nucleotide diversity calculated directly from the sequences in the database or from another human dataset? Is missing data at some sites accounted for in the denominator?

      The sequences in the database are mostly from short reads and come from a wide array of labs. We have added a paragraph to the discussion to explain the limitations of this (lines 473-499). However, the nucleotide diversity calculations shown in Figure 1 do not rely on the MHC database; rather, they are calculated from the human genomes in the 1000 Genomes project. Nucleotide diversity would be calculable for other species, but we did not do so for exactly the reason you mention–too much missing data.

      - The Figure 2 and Figure 3 supplements took me a little bit to understand - is it really worth pointing out the top 5 Bayes-factor comparisons when there is no evidence for TSP? A lot of the colored squares are not actually supporting TSP but in the grids you can't see which are and which aren't without looking at the Bayes Factor. I wonder if it would help if only those with BF > 100 were shown? Or if these were marked some other way so that it was easy to see where TSPs are supported.

      Thank you for your perspective on these figures! We initially limited them to only show >100 Bayes factors for each gene group and region, but some gene groups have no high Bayes factors. Additionally, the “summary” tree pictured in these figures is necessarily a simplification of the full space of posterior trees. We felt that showing low Bayes factor comparisons could help readers understand this relationship. For example, allele sets that look non-monophyletic on the summary tree may still have a low Bayes factor, showing that they are generally monophyletic throughout the larger (un-visualizable) space of trees.

      Reviewer #3 (Recommendations for the authors):

      Specific comments

      Abstract

      I think the abstract would benefit from some editing. For example, one might get the impression that you equate allele sharing, which would normally be understood as sharing identical sequences, with sharing ancestral allelic lineages. This distinction is important because you can have many TSPs without sharing identical allele sequences. In l. 20 you write about "deep TSP", which requires either definition of reformulation. In l. 21-23 you seem to suggest that long-term retention of allelic lineages is surprising in the light of rapid sequence evolution - it may be, depending on the evolutionary scenarios one is willing to accept, but perhaps it's not necessary to float such a suggestion in the abstract where it cannot be properly explained due to space constraints? The last sequence needs a qualifier like "in some cases".

      Thank you for catching these! For clarity, we changed several words:

      ● “alleles” to “allelic lineages” in line 13

      ● “deep” to “ancient” in line 21

      ● “Despite” to “in addition to” in line 22

      ● Added “in some cases” to line 28

      Results - Overall, parts of the results read like extended figure captions. I understand that the authors want to make the complex figures accessible to the reader. However, including so much information in the text disrupts the flow and makes it difficult to follow what the main findings and conclusions are.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      l. 37-39 such a short sentence on non-classical MHC is necessarily an oversimplification, I suggest it be expanded or deleted.

      There is certainly a lot to say about each of these genes! While we do not have space in this paper’s introduction to get into these genes’ myriad functions, we added a reference to our companion paper in lines 40-41:

      “See the appendices of our companion paper \citep{Fortier2024a} for more detail.”

      These appendices are extensive, and readers can find details and references for literature on each specific gene there. In addition, several genes are mentioned in analyses further on in the results, and their specific functions are discussed in more detail when they arise.

      l. 47 -49 It would be helpful to briefly outline your criteria for selecting these 17 genes, even if this is repeated later.

      Thank you! For greater clarity, we changed the text (lines 50-52) from “Here, we look within 17 specific genes to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.” to “Here, we look within 17 specific genes---representing classical, non-classical, Class I, and Class II ---to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.“  

      l.85-87 I may be completely wrong, but couldn't problems with establishing orthology in some cases lead to false inferences of TSP, even in primates? Or do you think the data are of sufficient quality to ignore such a possibility? (you touch on this in pp. 261-264)

      Yes, problems with establishing orthology can lead to false inferences of TSP, and it has happened before. For example, older studies that used only exon 2 (binding-site-encoding) of the MHC-DRB genes inferred trees that grouped NWM sequences with ape and OWM sequences. Thus, they named these NWM genes MHC-DRB3 and -DRB5 to suggest orthology with ape/OWM MHC-DRB3 and -DRB5, and they also suggested possible TSP between the groups. However, later studies that used non-binding-site-encoding exons or introns noticed that these NWM sequences did not group with ape/OWM sequences (which now shared the same name), providing evidence against orthology. This illustrates that establishing orthology is critical before assessing TSP (as is comparing across regions). This is part of the reason we published a companion paper (https://doi.org/10.7554/eLife.103545.1), which clears up questions of orthology and supports the analyses we did in this paper. In cases where orthology was ambiguous, this also helped us to be conservative in our conclusions here. The problems with ambiguous gene assignment are also discussed in lines 488-499.

      l. 88-93 is the first place (others are pp. 109-118 and 460-484) where a fuller description of the data used would be welcome. It's clear that the amount of data from different species varies enormously, not only in the number of alleles per locus, but also in the loci for which polymorphism data are available. In such a synthesis study, one would expect at least a tabulation of the data used in the appendices and perhaps a summary table in the main article.

      l. 109-118 Again, a more quantitative summary of the data used, with reference to a table, would be useful.

      Thank you! To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534). Supplementary Files listing the exact alleles and sequences used in each group are also included in the resubmission.

      l. 123-124 here you say that the definition of the "16 gene groups" is in the methods (probably pp. 471-484), but it would be useful to present an informative summary of your rationale in the introduction or here

      Thank you! We agree that it is helpful to outline these groups earlier. We have changed the paragraph in lines 123-135 from: 

      “We considered 16 gene groups and two or three different genic regions for each group: exon 2 alone, exon 3 alone, and/or exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. See the Methods for more detail on how gene groups were defined. Because few intron sequences were available for non-human species, we did not include them in our analyses.” To: 

      “We considered 16 gene groups spanning MHC classes and functions. These include the classical Class I genes (MHC-A-related, MHC-B-related, MHC-C-related), non-classical Class I genes (MHC-E-related, MHC-F-related, MHC-G-related), classical Class IIA genes (MHC-DRA-related, MHC-DQA-related, MHC-DPA-related), classical Class IIB genes (MHC-DRB-related, MHC-DQB-related, MHC-DPB-related), non-classical Class IIA genes (MHC-DMA-related, MHC-DOA-related, and non-classical Class IIB genes (MHC-DMB-related, MHC-DOB-related). We studied two or three different genic regions for each group: exon 2 alone, exon 3 alone, and (for Class I) exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. Because few intron sequences were available for non-human species, we did not include them in our analyses.”

      l. 100 "alleles" -> "allelic lineages"

      Thank you for catching this. We have changed this language in line 104.

      l. 227-238 it's important to discuss the possible effect of the number of sequences available on the detectability of TSP - this is particularly important as the properties of MHC genealogies may differ considerably from those expected for neutral genealogies.

      This is a good point that may not be obvious to readers. We have added several sentences to clarify this:

      Line 193-194: “In a neutral genealogy, monophyly of each species' sequences is expected.”

      Line 213-219: “Note that the number of sequences available for comparison also affects the detectability of TSP. For example, if the only sequences available are from the same allelic lineage, they will coalesce more recently in the past than they would with alleles from a different lineage and would not show evidence for TSP. This means our method is well-suited to detect TSP when a diverse set of allele sequences are available, but it is conservative when there are few alleles to test. There were few available alleles for some non-classical genes, such as MHC-F, and some species, such as gibbon.”

      Line 244-246: “However, since there are fewer alleles available for the non-classical genes, we note that our method is likely to be conservative here.”

      l. 301 and 624-41 it's been difficult for me to understand the rationale behind using rates at mostly gap positions as the baseline and I'd be grateful for a more extensive explanation

      Normalizing the rates posed a difficult problem. We couldn’t include every single sequence in the same alignment because BEAST’s computational needs scale with the number of sequences. Therefore, we had to run BEAST separately on smaller alignments focused on a single group of genes at a time. We still wanted to be able to compare evolutionary rates across genes, but because of the way SubstBMA is implemented, evolutionary rates are relative, not absolute. Recall that to help us compare the trees, we included a common set of “backbone” sequences in all of the 16 alignments. This set included some highly-diverged genes. Initially, we planned to use 4-fold degenerate sites as the baseline sites for normalization, but there simply weren’t enough of them once we included the “backbone” set on top of the already highly diverse set of sequences in each alignment. This diversity presented an opportunity.  In BEAST, gaps are treated as missing and do not contribute any probability to the relevant branch or site (https://groups.google.com/g/beast-users/c/ixrGUA1p4OM/m/P4R2fCDWMUoJ?pli=1). So, we figured that sites that were “mostly gap” (a gap in all the human backbone sequences but with an insertion in some sequence) were mostly not contributing to the inference of the phylogeny or evolutionary rates. Because the “backbone” sequences are common to all alignments, making the “mostly gap” sites somewhat comparable across sets while not affecting inferred rates, we figured they would be a reasonable choice for the normalization (for lack of a better option).

      We added text to lines 680 and 691-693 to clarify this rationale.

      l. 380-84 this overview seems rather superficial. Would it be possible to provide a more quantitative summary?

      To make this more quantitative, we plotted the number of associations for each amino acid against evolutionary rate, shown in Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the template does not allow). This reveals a significant positive slope for the Class I genes, but not for Class II. We also added explanatory text for this figure in lines 400-404.

      Discussion - your approach to detecting TSP is elegant but deserves discussion of its limitations and, in particular, a clear explanation of why detecting TSP rather than quantifying its extent is more important in the context of this work. Another important point for discussion is alternative explanations for the patterns of TSP or, more broadly, gene tree - species tree discordance. Although long-term maintenance of allelic lineages due to long-term balancing selection is probably the most convincing explanation for the observed TSP, interspecific introgression and incorrect orthology assessment may also have contributed, and it would be good to see what the authors think about the potential contribution of these two factors.

      Overall, our goal was to use modern statistical methods and data to more confidently assess how ancient the TSP is at each gene. We have added several lines of text (as noted elsewhere in this document) to more clearly illustrate the limitations of our approach. We also agree that interspecific introgression and incorrect orthology assessment can cause similar patterns to arise. We attempted to minimize the effect of incorrect orthology assessment by creating multi-gene trees and exploring reference primate genomes, as described in our companion paper (https://doi.org/10.7554/eLife.103545.1), but cannot eliminate it completely. We have added a paragraph to the discussion to address this (lines 488-499). Interspecific introgression could also cause gene tree-species tree discordance, but we are not sure about how systematic this would have to be to cause the overall patterns we observe, nor about how likely it would have been for various clades of primates across the world.

      l. 421 -424 A more nuanced discussion distinguishing between positive selection, which facilitates the establishment of a mutation, and directional selection, which leads to its fixation, would be useful here.

      We added clarification to this sentence (line 443-445), from “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate.” to “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate, generating ample mutations upon which selection may act.”

      l. 432-434 You write here about the shaping of TCR repertoires, but I couldn't find any such information in the paper, including Table 1.

      We did not include a separate column for these, so they can be hard to spot. They take the form of “TCR 𝛽 Interaction Probability >50%”, “TCR Expression (TRAV38-1)”, or “TCR 𝛼 Interaction Probability >50%” and can be found in Table 1.

      l. 436-442 Here a more detailed discussion in the context of divergent allelic advantage and even the evolution of new S-type specificities in plants would be valuable.

      We added an additional citation to a review article to this sentence (lines 438-439).  

      l. 443 The use of the word "training" here is confusing, suggesting some kind of "education" during the lifetime of the animal.

      We agree that “train” is not an entirely appropriate term, and have changed it to “evolve” (line 465).

      489-491 What data were used for these calculations?

      Apologies for missing this citation! We used the 1000 genomes project data, and the citation has been updated (line 541-542).

    1. Author response:

      Reviewer 1:

      Concern 1: Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      We thank you for your recommendations. As rearranging figures is not critical to convey the data, we have decided to keep the figures and supplemental figures as they are currently presented.

      Concern 2a: It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.

      This data is published in a previous manuscript from our group. Please see Figure 1 in “Layilin Anchors Regulatory T Cells in Skin” (PMID: 34470859)

      Concern 2b: We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.

      Given that we have already shown that layilin plays a major role in Treg and CD8+ T cell adhesion in tissues, we used a candidate approach for our GSEA. We tested the hypothesis that adhesion and motility pathways are enriched in Layilin-expressing Tregs. There was a statistically significant enrichment for these genes in Layilin+ Tregs compared to Layilin- Tregs, which we feel adequately tests our hypothesis.

      Concern 2c: For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.

      We respect this concern. We omit these secondary to space constraints.

      Concern 2d: For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      We respectfully disagree. Three doners were used in a paired fashion (internally controlled) achieving statistical significance.

      Concern 3: For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

      We agree. However, as the reviewer points out, these experiments are not logistically and practically feasible at this point. We do perform several experiments in this manuscript in which layilin is reduced via gene editing with results supporting our hypotheses.

      Reviewer 2:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

      We have been transparent with all our methods and data. We will leave this to the reader to determine level of rigor and the robustness of the data.

      Reviewer 3:

      Weaknesses:

      It is not clear that the assays used for functional analysis of the patient samples were optimal. (2) Several conclusions are not fully substantiated. (3) The report is lacking some experimental details.

      We have tried to be as comprehensive and thorough as possible. We feel that the data supports our conclusions. We will leave this to the reader to interpret and conclude.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Weaknesses:

      The robust data showing the quality of this model at the transcriptomic level can be strengthened with confirmation at protein and functional levels. The authors were unable to investigate the effects of Adar1-KO using ER-Hoxb8 cells and instead had to rely on a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H). Additionally, ER-Hoxb8-derived microglia do not express Sall1, a key marker of microglia, which limits their fidelity as a full microglial replacement, as has been rightfully pointed out in the discussion.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. While Adar1-KO macrophages do not engraft well, the success of TLR4-KO line highlights the model's potential for investigating other genes. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

      Thank you for this thoughtful and balanced assessment. The major suggestion from Reviewer 1 was that confirmation of RNAseq data with protein or functional studies would add strength.  We provided protein staining by IHC for IBA1 in vivo, as well as protein staining by FACS for CD11B, CD45, and TMEM119 in vitro and in vivo.  For TLR4, we showed successful protein KO and blunted response to LPS (a TLR4 ligand) challenge, which we believe provides some protein and functional data to support the approach.  To bolster these data, we added staining for P2RY12 on brain-engrafted ER-Hoxb8s.

      Regarding the Adar1 KO phenotypes showing non-engraftment. Because ADAR1 KO mice are embryonically lethal due to hematopoietic failure, we see the health impacts of Adar1 KO on ER-Hoxb8s as a strength of the transplantation model, enabling the assessment of ADAR1 global function in macrophages and microglia-like cells without generation of a transgenic mouse line. In addition, it was a surprise that the health impact occurs at the macrophage and not the progenitor stage, perhaps providing insight for future studies of ADAR1’s role in hematopoiesis. Instead, we were able to show a significant impact of complete loss of Adar1 on survival and engraftment, suggesting an important biological function of ADAR1. Macrophage-specific D1113H mutation, which affects part of the deaminase domain, shows that when the RNA deamination (but not the RNA binding) function of ADAR1 is disrupted, we find brain-wide interferonopathy. This is very exciting to our group and hopefully the community as astrocytes are thought to be a major driver of brain interferonopathy in patients with ADAR1 mutations. Instead, this suggests that disruption of brain macrophages is also a major contributor. 

      Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gained traction in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provided evidence to support: (1)

      Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to the Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competence and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an elegant study, demonstrating both the utility and limitations of ER-Hoxb8 technology as a surrogate model for microglia in vivo. The manuscript is well-designed and clearly written, but authors should consider the following suggestions:

      (1) Validation of RNA hits at the protein level: To strengthen the comparison between ER-Hoxb8 macrophages and WT bone marrow-derived macrophages, validating several RNA hits at the protein level would be beneficial. As many of these hits are surface markers, flow cytometry could be employed for confirmation (e.g., Figure 1D, Figure 3E).

      In vitro, we show protein levels by flow cytometry for CD11B (ITGAM) and CD45 (PTPRC; Figure 1C), as well as TMEM119 (Supplemental Figure 2A) and TLR4 (Supplemental Figure 3C/D). In vivo, we show TMEM119 protein levels by flow cytometry (Figure 3A), as well as their CD11B/CD45 pregates (Supplemental Figure 2C), plus immunostaining for IBA1 (AIF1; Figure 2D). We now provide additional data showing P2RY12 immunostaining in brain-engrafted cells (Supplemental Figure 2B). 

      (2) The authors should consider testing the phagocytic capacity of ER-Hoxb8-derived macrophages to further validate their functionality.

      Thank you for the suggestion. We measured ER-Hoxb8 macrophage ability to engulf phosphatidylserine-coated beads that mimic apoptotic cells, compared with phosphatidylcholine-coated beads, now as new Supplemental Figure 1C/D. This agrees with existing literature showing efficient engulfment/phagocytosis by ER-Hoxb8-derived cells (Elhag et al., 2021).

      (3) For Figure 3E, incorporating a wild-type (WT) microglia reference would be beneficial to establish a baseline for comparison (e.g. including WT microglia data in the graph or performing a ratio analysis against WT expression levels).

      We agree - we now include bars representing our sequenced primary microglia data in Figure 3E as a comparison.  

      (4) Some statistical analyses may require refinement. Specifically, for Figure 4J, where the effects of Adar1 KO and Adar1 KO with Bari are compared, it would be more appropriate to use a two-way ANOVA.

      Thank you for noting it. We have now done more appropriate two-way ANOVA and included the updated results in Figure 4J and the corresponding Supplemental Figure 4G. Errors in figure legend texts have also been corrected to reflect the statistical tests used.

      (5) Cx3cr1-creERT2 pups injected with tamoxifen: The authors could clarify the depletion ratio in these experiments before the engraftment and assess whether the depletion is global or regional. In comparison to Csf1r-/-, where TLR4-KO ER-Hoxb8 engraft globally, in Cx3cr1-creERT2, the engraftment seems more regional (Figure 5A vs Supplementary Figure 5B); is this due to the differences in depletion efficiency?

      This is an excellent question and observation, and one that we are very interested in, though that finding does not change the conclusions of this particular study.  We find some region-specific differences in depletion early after tamoxifen injection, but that all brain regions are >95% depleted by P7. For instance, in a recently published manuscript (Bastos et al., 2025) we find some differences in the depletion kinetics in the genetic model. By P3, we find 90% depletion in cortex with 50-60% in thalamus and hippocampus. In other studies, we typically deliver primary monocytes, and this is the first study where we report engraftment of ER-Hoxb8 cells in the inducible model.  In this sense, it is possible that depletion kinetics may regionally affect engraftment, but future studies are required to more finely assess this point with ER-Hoxb8s, as it may change how these models are used in the future.

      Bastos et al., Monocytes can efficiently replace all brain macrophages and fetal liver monocytes can generate bonafide SALL1+ microglia, Immunity (2025), https://doi.org/10.1016/j.immuni.2025.04.006

      (6) It would be helpful for the authors to clarify whether Adar1 is predominantly expressed by microglia, especially since the study aims to show its role in dampening the interferon response.

      That’s a wonderful point. Adar1 is expressed by all brain cells, with highest transcript level in some neurons, astrocytes, and oligodendrocytes. It is an interferon-stimulated gene, and mutation itself leads to interferonopathy, we believe, due to poor RNA editing and detection of endogenous RNA as non-self by MDA5. We hope it can dampen the interferon response, but in the case of mutation, Adar1 is probably causal of interferonopathy.  It is induced in microglia upon systemic inflammatory challenge (LPS). We have edited the text to highlight its expression pattern.  See BrainRNAseq.org (Zhang*, Chen*, Sloan*, et al., 2014 and Bennett et al., 2016)

      Reviewer #2 (Recommendations for the authors):

      (1) There appears to be a morphological difference between wt and Adar1/Ifih1 double KO (dKO) cells in the engrafted brains (Figure 5). It would be good if the authors could systematically compare the morphology (e.g., soma size, number, and length of branches) of the engrafted MLCs between the wt and mutant cells.

      We agree. While cells did not differ in branch number or length, engrafted dKO cells had significantly larger somas compared with controls, which we now present in Figure S5A.

      (2) To fully appreciate the extent of how those engrafted ER-Hoxb8 immortalized macrophages resemble primary, engrafted yolk sac-myeloid cells, vs engrafted iPSC-induced microglia, it would be informative to provide a comparison of their RNAseq data derived from the engrafted ER-Hoxb8 immortalized macrophages with published data transcriptomic data sets (e.g. Bennett et al. Neuron 2018; Chadarevian et al. Neuron 2024; Schafer et al. Cell 2023).

      Thank you for this suggestion. To address this, we provide our full dataset for additional experiments. To compare with a similar non-immortalized model, we compared top up- and down-regulated genes from our data to those of ICT yolk sac progenitor cells from our previous work (Bennett et al., 2018). We find overlap between brain-engrafted ER-Hoxb8-, bone marrow-, and yolk sac-derived cells (Supplemental Figure 2F, Supplemental Table 3).  

      Minor comments:

      Figure 6C: red arrow showing zoom in regions are not matchable. It might be beneficial to provide bigger images with each channel for C and D as a Supplemental Figure.

      We fixed this in Figure 6C to show areas of interest in the cortex for both conditions. Figure S7A shows intermediate power images to aid in interpretation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Weaknesses:

      While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant. Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.

      In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We made text edits to clarify the use of the linear mixed effect model. (page6, second paragraph and page 11, first paragraph)

      Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening also is a key advantage of our induced seizure model.  

      Reviewer 1 (Recommendations for the authors):

      (1) Address why the EEG data comparisons were performed between seizures and not between animals (as explicitly described in the public review). Further, a discussion of the biological significance (or lack thereof) of the effect size differences observed is warranted. This is especially concerning when the authors make the claim that spontaneous and induced seizures are essentially the same while their analysis shows all evaluated feature space parameters were significantly difference in the initial 1/3 of the EEG waveforms.

      We made text edits to clarify the use of the linear mixed effects model (page 6, second paragraph, and page 11, first paragraph)

      (2) The authors place great emphasis on the use of clinically/etiologically relevant epilepsy models in drug discovery research. There is discussion criticizing the time points required to enact kindling and the artificial nature of acute seizure induction methods. However, the combination IHK-opto seizure induction model also requires a lengthy timeline. A more tempered discussion of this novel model's strengths may benefit readers.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16.

      (3) The authors should further emphasize the benefit of having an inducible seizure model of focal epilepsy since other mouse models (e.g., genetic or TBI models) may have superior etiological relevance (construct and face validity) but may not be amenable to their optogenetic stimulation approach.

      Thank you for the suggestion. We revised the manuscript to better emphasize the potential significance of our approach. We added a discussion in the 'Application of Models...' section on page 15, second paragraph. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation.

      (4) Suggestion: Provide immunolabeled imagery demonstrating ChR2 presence in Thy1 cells.

      Thank you for the suggestion. We added a fluorescence image showing ChR2 expression in Fig. 2A

      (5) It might be prudent to mention any potential effects of laser heat on hippocampal cell damage, although the 10 Hz, ~10 mW, and 6 s stim is unlikely to cause any substantial burns. Without knowing the diameter and material of the optic fiber, this is left up to some interpretation.

      Thank you for the comments. In the Methods section, we listed the optical fiber diameter as 400 microns (page 17, EEG and Fiber Implantation section). Using 5–18 mW laser power with a relatively large fiber diameter of 400 microns, the power density falls within the range of commonly employed channelrhodopsin activation conditions in vivo. That said, we would like to investigate potential heat effects or cell damage in a follow-up study.

      (6) There are instances in the manuscript where the authors describe experimental and analytical parameters vaguely (e.g. "Seizures were induced several times a day", "stimulation was performed every 1 - 3 hours over many days"). These descriptions can and should be more precise.

      Thank you for the comments. To enhance clarity, we added the stimulation protocol in a flowchart format in Fig. S2A, describing how we determined the threshold and proceeded to the drug test. Following this protocol, there was variability in the number of stimulations per day.

      (7) In the second to last paragraph of the discussion, the authors state "However, HPDs are not generalizable across species - they are specific to the mouse model (55)." This statement is inaccurate. The paper cited comes from Dr. Corrine Roucard's lab at Synapcell. In fact, Dr. Rouchard argues the opposite (See Neurochem Res (2017) 42:1919-1925).

      Thank you for pointing out the mistake. On page 16, in the first paragraph, reference 55 (now 58 in the revised version) was intended to refer to 'quickly produce dose-response curves with high confidence.' In the revision, we cited another paper reporting that hippocampal spikes were not reproduced in the rat IHK model. R. Klee, C. Brandt, K. Töllner, W. Löscher, Various modifications of the intrahippocampal kainate model of mesial temporal lobe epilepsy in rats fail to resolve the marked rat-to-mouse differences in type and frequency of spontaneous seizures in this model. Epilepsy Behav. 68, 129–140 (2017).

      (8) In the discussion, Levetiracetam is highlighted as an ASM that would not be detected in acute induced seizure models; the authors point out its lack of effect in MES and PTZ. However, LEV is effective in the 6Hz test (also an acute-induced seizure model). This should be stated.

      Thank you for the comments. We highlighted the discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (9) The results text indicates that 9 epileptic mice were used to test LEV and DZP. However, the individual data points illustrated in Figure 5B show N=8 mice. Please correct.

      Thank you for the comments. A total of nine epileptic mice were used to assess two drugs, with the animals being re-used as indicated in the schematic. A total of eight assessments were conducted for DZP with six mice and eight assessments for LEV with five mice. Each assessment included hourly ChR2 activations without an ASM and hourly ChR2 activations after ASM injection.

      (10) Figure 4D: Naïve mice are labeled as solid blue circles in the legend while the data points are solid blue triangles. Please correct.

      Thank you. We corrected the marker in Fig.4D.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs. This seems to be a problem in other areas of the paper, also.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9, which shows behavioral seizure severity scores observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (2) The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.

      Thank you for the comments. IHK-injected mice had spontaneous tonic-clonic seizures before the start of optical stimulation, as shown in Figure S1.

      (3) The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.

      We appreciate the reviewer’s insights. We added a discussion comparing our model with other existing models in the Discussion section (pages 15 and 16, 'Comparison to Other Seizure Models Used in Pharmacologic Screening' section). In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      (4) The outcome measure for testing LEV and DZP on seizures was essentially the fraction of unsuccessful or successful activations of seizures, where high ASD efficacy is based on showing that the optogenetic stimulation causes fewer seizures when the drug is present. The final outcome measure is thus a percentage, which would still lead to a large number of tests to be assured of adequate statistical power. Thus, there is a concern about whether this proposed approach will have high enough resolution to be more useful than conventional screening methods so that one can obtain actual dose-response data on ASDs.

      Thank you for the comments. In this revision, we added Supplemental Figure S9, showing the severity of behavioral seizures observed before and during ASM testing for each animal. We observed a reduction in behavioral seizure severity for each subject. We would like to explore using behavioral severity as an outcome measure in a follow-up study.

      (5) The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.

      We appreciate the reviewer’s insights. We revised the manuscript to better emphasize the potential significance of our approach (page 15, second paragraph). The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents a critical goal. However, we believe that a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript, and we would like to explore it in a follow-up study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors should explain why 10 Hz was chosen as the stimulation frequency.

      Thank you for the comment. A frequency of 10 Hz was determined based on previous work using anesthetized animals prepared in an acute in vivo setting. To simplify the paper and avoid confusion, we did not include a discussion on how we determined the frequency. Instead, we added a detailed description of how we optimized the power in a flowchart format in Supplemental Figure S2. We hope this improves reproducibility.

      (2) After micro-injection of KA, morphological changes were observed in the hippocampus, but no comparison of Chr2 expression was made in naïve animals vs KA-injected animals. Presumably, the Thy1-Chr2 mouse expresses GFP in cells that express Chr2. Thus, it may be useful to show the expression of Chr2 in animals with hippocampal sclerosis. This may explain the lack of dramatic difference between stimulation parameters in naïve vs epileptic animals, as shown in supplemental Figure S2.

      Thank you for the suggestion. We added a fluorescence image of ChR2 expression in CA1, ipsilateral to the KA-injected site, in Fig. 2A.

      (3) The authors state that "During epileptogenesis, neural networks in the brain undergo various changes ranging from modification of membrane receptors to the formation of new synapses" and that these changes are critical for successful "on-demand" seizure induction. However, it is not clear or well-discussed whether changes in neuronal cell densities that occur during sclerosis are important for "on-demand" seizure induction as well. Also, the authors showed that naïve animals exhibit a kindling-like effect, but it was unclear whether a similar effect was present in epileptic animals (i.e. do stimulation thresholds to seizure induction change as the animal gets more induction stimulations)? If present, would the secondary kindling affect drug-testing studies (e.g., would the drug effect be different on induced seizure #2 vs induced seizure #20)?

      Thank you for the suggestion. Since this is an important aspect of the model, we would like to address the kindling effect, the secondary kindling effect, and histopathology in a longer-term setting (several weeks) in a follow-up study.

      (4) The authors show that in their model, LEV and DZP were both efficacious. The authors do not seem to mention that, over 25 years ago, LEV was originally missed in the standard ETSP screens; and, it was only discovered outside of the ETSP with the kindling model. The kindling model is now used to screen ASDs. The authors should consider adding this point to the Discussion. It remains unclear, however, if the author's screening strategy shows advantages over kindling and other such approaches in the field.

      Thank you for the suggestion. We added a discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (5) P8 paragraph 2. The authors state values for naïve animals, but they should also provide values for epileptic animals since they state that the groups were not significantly different (p>0.05). It would be useful to show values for both and state the actual p-value from the test. This issue of stating mean/median values with SD and sample size should be addressed for all data throughout the paper. Additionally, Figure S2 should be added to the manuscript and discussed, as it has data that may be valuable for the reproducibility of the paper.

      Thank you for the suggestion. Figure S2 shows the threshold power required to induce electrographic activity for n = 10 epileptic animals (9.14 ± 4.75 mW) and n = 6 naïve animals (6.17 ± 1.58 mW) (Wilcoxon rank-sum test, p = 0.137). The threshold duration was comparable between the same epileptic animals (6.30 ± 1.64 s) and naïve animals (5.67 ± 1.03 s) (Wilcoxon rank-sum test, p = 0.7133). 

      (6) In addition to the other stated references on synaptic reorganization in the CA1 area, the authors should mention similar studies from Esclapez et al. (1999, J Comp Neurol).

      Thank you. We have included the reference in the revision.

      (7) All of the raw EEG data on the seizures should be accessible to the readers.

      Thank you for the suggestion. We will consider depositing EEG data in a publicly accessible site.

      Reviewer 3 (Public review):

      Weaknesses:

      (1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      Thank you for the comment. We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9 to show the types of seizures observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.

      In this work, we used a linear mixed-effects model to address two levels of variability—between animals and within animals. The interactive linear mixed-effects model shows that most (~90%) of the variability in our data comes from within animals (residual), the random effect that the model accounts for, rather than between animals. Since variability between animals is low, the model identifies common changes in seizure propagation across animals while accounting for the variability in seizures within each animal. Therefore, the results we find reflect changes that occur across animals, not individual seizures. We made text edits to clarify the use of the linear mixed-effects model.

      (4) Seizure burden is not easily tested.

      Thank you for the comment. We added Supplemental Figure S9 to summarize the severity of behavioral seizures before and during ASM testing. This addresses the reviewer’s comment on seizure burden. In a follow-up study, we would like to explore this type of outcome measure for drug screening.

      Reviewer 3 (Recommendations for the authors):

      (1) Provide a stronger rationale to use area CA1. For example, the authors mention that CA1 is active during seizure activity, but can seizures originate from CA1? That would make the approach logical and also explain why induced and spontaneous seizures are similar.

      Thank you for the comment. We discussed it in the Discussion section (page 14, first and second paragraphs).

      (2) Explain the use of SVM classifiers so it is more convincing that induced and spontaneous seizures are similar. Or, if they are not similar, explain that this is a limitation.

      We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (3)If feasible, extend the duration over which seizure induction reliability is assessed so that the long-term utility of the model can be demonstrated.

      Thank you for the suggestion. We would like to assess long-term utility in a follow-up study.

      (4) The GitHub link is not yet active. The authors will be required to supply their relevant code for peer evaluation as well as publication.

      Thank you. The GitHub repository is now active.

      (5) State and assess the impacts of sex as a biological variable.

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      Thank you for the valuable comments and good suggestions you have proposed, and we have added information and analysis of another mouse model for LAMA2-MD in the updated version 2 of this manuscript.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Thank you for the good comments you have proposed, and we have carefully corrected the overinterpretation and overstatements in the previous updated version.

      Unfortunately, the data on RNA-seq and scRNA-seq are still rather weak. scRNA-seq was conducted with only one mouse resulting in only 8000 nuclei. I am not convinced that the data allow us to interpret them to the extent of the authors. Similar to the first version, the authors infer function by examining expression. Although they are a bit more cautious, they still argue that the BBB is not functional in dy<sup>H</sup>/dy<sup>H</sup> mice without showing leakiness. Such experiments can be done using dyes, such as Evans-blue or Cadaverin. Hence, I would suggest that they formulate the text still more carefully.

      Thank you for the valuable suggestions. We also agree that we should perform more related functional experiments such as Evans-blue or Cadaverin to confirm the impaired BBB. However, the related functional experiments haven’t been done due to the first author has been working in clinic. While, we have added the "Limitations" part, and made statements in the Limitations part with "Even though RNA-seq and scRNA-seq have been performed, the data of scRNA-seq are still insufficient due to the limited number of mouse brains. This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed".

      A similar lack of evidence is true for the suggested cobblestone-like lissencephaly of the mice. There is no strong evidence that this is indeed occurring in the mice (might also be a problem because mice die early). Hence, the conclusions need to be formulated in such a way that readers understand that these are interpretations and not facts.

      Thank you for the valuable suggestions. We do agree with this comment, and have made statement in the Limitations with "This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed". Also, for the cobblestone-like lissencephaly which was showed in LAMA2-CMD patients while not found in the mouse model, we have added the discussion as "Though the cortical malformations were not found in the dy H/dy H brains by MRI analysis probably due to the small volume in within 1 month old, Thus, the changes in transcriptomes and protein levels provided potentially useful data for the hypothesis of the impaired gliovascular basal lamina of the BBB, which might be associated with occipital pachygyria in LAMA2-CMD patients."

      Finally, I am surprised that the only improvement in the main figures is the Western blot for laminin-alpha2. The histology of skeletal muscle still looks rather poor. I do not know what the problems are but suggest that the authors try to make sections from fresh-frozen tissue. I anticipate that the mice were eventually perfused with PFA before muscles were isolated. This often results in the big gaps in the sections.

      Thank you for the valuable suggestions. We do agree with this comment and we should make sections from fresh-frozen tissue. Therefore, we have made statement in the Limitations with "Moreover, due to making sections with PFA before muscles isolated, and not from fresh-frozen tissue, there have been big gaps in the sections which do affect the histology of skeletal muscle to some extent."

      Overall, the work is improved but still would need additional experiments to make it really an important addition to the literature in the LAMA-MD field.

      Thank you for all your good comments and the valuable suggestions.

      Reviewer #2 (Public Review):

      This revised manuscript describes the production of a mouse model for LAMA2- Related Muscular Dystrophy. The authors investigate changes in transcripts within the brain and blood barrier. The authors also investigate changes in the transcriptome associated with the muscle cytoskeleton. Strengths: (1) The authors produced a mouse model of LAMA2-CMD using CRISPR-Cas9. (2) The authors identify cellular changes that disrupted the blood-brain barrier.

      Thank you for your good comments.

      Weaknesses:

      The authors throughout the manuscript overstate "discoveries" which have been previously described, published and not appropriately cited.

      Thank you for your great suggestion. We have toned-down the interpretations and overstatements throughout the manuscript, and added words such as "potentially", "possible", "some potential clues", "was speculated to probably", and so on.

      Alternations in the blood brain barrier and in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published in the literature and are not cited appropriately.

      Thank you for your great suggestion. We do agree with that alternations in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published, and the related literatures have been cited in the updated version 2.0. However, alternations in the blood brain barrier in LAMA2-CMD haven’t been extensively studied, only some papers (such as PMID: 25392494, PMID: 32792907) have investigated or discussed this issue.

      The authors have increased animal number to N=6, but this is still insufficient based on Power analysis results in statistical errors and conclusions that may be incorrect.

      Thank you for your great suggestion. We do agree that the animal number should be increased for Power analysis, and we have added statements in the Limitations with "Finally, due to the limited number of animal samples for the Power analysis, the statistical errors and conclusions might be affected."

      The use of "novel mouse model" in the manuscript overstates the impact of the study.

      Thank you for your great suggestion. We have changed the statement "novel mouse model" throughout the manuscript except the title.

      All studies presented are descriptive and do not more to the field except for producing yet another mouse model of LAMA2-CMD and is the same as all the others produced.

      Thank you for your comment. We do agree that further functional experiments have not been performed to reveal and confirm the pathogenesis. However, the analysis of phenotype was systematic and comprehensive, including survival time, motor function, serum CK, muscle MRI, muscle histopathology in different stages, and brain histopathology. Moreover, RNA-seq and scRNA-seq in LAMA2-CMD have been seldom performed before, and the data in this study could provide potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD.

      Grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength, which is better achieved using ex vivo or in vivo muscle contractility studies.

      Thank you for your great suggestion. We do agree that grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength. And we have added related statement in the Limitations with "Grip strength measurements used in this study are considered error prone and do not give an accurate measurement of muscle strength, which would be better achieved using ex vivo or in vivo muscle contractility studies."

      A lack of blinded studies as pointed out of the authors is a concern for the scientific rigor of the study.

      Thank you for your great suggestion. We performed the studies with those scoring outcome measures not blinded to the groups. Actually, it was very easy to discriminate the dy<sup>H</sup>/dy<sup>H</sup> groups from the WT/Het mice due to that the dy<sup>H</sup>/dy<sup>H</sup> mice showed much smaller body shape than other groups from as early as P7 .

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      There are multiple grammatical errors throughout the manuscript which should be corrected.

      Thank you for your recommendation. We have carefully corrected the grammatical errors within the manuscript.

      The authors mention no changes in intestinal muscles, but it is unclear if they are referring to skeletal or smooth muscle.

      Thank you for your good comment. The intestinal muscles with no changes in this study are referring to smooth muscle, and we have changes the description into intestinal smooth muscles.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have done an impressive job in responding to the previous critique and even gone beyond what was asked. I have only very minor comments on this excellent manuscript. The manuscript also needs some light editing for grammar and readability.

      We have worked to improve the grammar and readability of the manuscript.

      Comments:

      Lines 227-234: At what age was tamoxifen administered to the various CreERTM mice?

      We have updated the ages of the mice used in this study in the methods sections.

      UMAP in Figure 5A is missing label for cluster 19.

      The UMAP in Figure 5A has the label for cluster 19 at the center-bottom of the image.

      Supplement Figure 6: Cluster 10 seems to be separate from the other AdvC clusters, and it includes some expression of Myh11 and Notch3. Further, there is low expression of Pdgfra in this cluster, which can be seen in panel B and panels D-I. Are the Pdgfra negative cells in the pie charts from cluster 10? Could the cells in this cluster by more LMC like than AdvC like?

      We agree with the reviewer that the subcluster 10 of the fibroblasts cells are intriguing if only a minor population. When assessing just this population of cells, which is 77 cells out of 2261 total, 40 of the 77 were Pdgfra+ and of the 37 remaining Pdgfra- but 11 of those were still CD34+. Thus at least half of these cells could be expected to have the PdgfraCreERTM. Only 8 of the 37 were Pdgfra-Notch3+ while 12 cells were Pdgfra+Notch3+, and only 3 were Pdgfra-Myh11+ while 3 were Pdgfra+Myh11+. 26 of 77 cells were Pdgfra+Pdgfrb+ double positive, while 12 of 37 Pdgfra- cells were still Pdgfrb+. Additionally, within the 77 cells of subcluster 10 17 were positive for Scn3a (Nav1.3), 21were positive for Kcnj8 (Kir6.1), and 33 were positive for Cacna1c (Cacna1c) which are typically LMC markers would support the reviewers thinking that this group contains a fibroblast-LMC transitional cell type. Only 2 of 77 cells were positive for the BK subunit (Kcnma1), which is a classic smooth muscle marker. Another possibility is this population represents the Pdgfra+Pdgfrb+ valve interstitial cells we identified in our IF staining and in our reporter mice. Of note almost all cells in this cluster were Col3a1+ and Vim+. Even though we performed QC analysis to remove doublets, it is also possible some of these cells could represent doublets or contaminants, however the low % of Myh11 expression, a very highly expressed gene in LMCs especially compared to ion channels, would suggest this is less likely. Assessing the presence of this particular cell cluster in future RNAseq or with spatial transcriptomics will be enlightening.

      Line 360. Proofread section title.

      We have simplified this title to read “Optogenetic Stimulation of iCre-driven Channel Rhodopsin 2”

      Lines 370-371. Are the length units supposed to be microns or millimeters?

      We have corrected this to microns as was intended. Thank you for catching this error.

      The resolution for each UMAP analysis should be stated, particularly for the identification of subclusters. How was the resolution chosen?

      To select the optimal cluster resolution, we used Clustree with various resolutions. We examined the resulting tree to identify a resolution where the clusters were well-separated and biologically meaningful, ensuring minimal merging or splitting at higher resolutions. Our goal was to find a resolution that captures relevant cell subpopulations while maintaining distinct clusters without excessive fragmentation. We have now stated the resolution for the subclustering of the LECs, LMCs, and fibroblasts. We have also added greater detail regarding the total number of cells, QC analysis, and the marker identification criteria used to the methods sections. We used resolution of 0.5 for sub-clustering LMCs, 0.87 for LECs, and 1.0 for fibroblasts.  These details are now added to the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author Response:

      In the Weaknesses, Reviewer 3 suggests that in the Discussion, we comment upon whether WRN ATPase/3’-5’ helicase and WRNIP1 ATPase work on Y-family Pols additively or synergistically to raise fidelity. However, in the Discussion on page 20, we do comment on the role of WRN and WRNIP1 ATPase activities in conferring an additive increase in the fidelity of TLS by Y-family Pols.

    1. Author Response:

      We thank the reviewers for their thoughtful feedback and appreciate their recognition of the value of our findings. In response, we are refining the manuscript to clarify key terminology, more clearly describe our image analysis workflows, and temper the interpretation of our results where appropriate. We are planning to perform additional experiments to further investigate the specificity of mRNA co-localization between BK and CaV1.3 channels. We acknowledge the importance of understanding ensemble trafficking dynamics and the functional role of pre-assembly at the plasma membrane, and we plan to explore these questions in future work. We look forward to submitting a revised manuscript that addresses the reviewers’ comments in detail.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

      We thank the reviewer for the valuable recommendation. We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      Furthermore, we acknowledge reviewers' comments that SIRT2 regulates systemic inflammatory responses and provides potent protection against viral infection. Additionally, NF-κB is not the only mediator of SIRT2's suppression of viral infection; other possible molecular mechanisms are also involved in this process.

      Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

      We thank the reviewer for the valuable recommendation. We are willing to measure NF-kB acetylation in AdSIRT2 JEV-infected cells compared to WT-infected cells, to verify that the acetylation of NF-kB is truly linked to SIRT2 expression levels as per the reviewers' suggestion.

      We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      We are accepting the reviewer's suggestion that AGK2 can also inhibit other Sirtuins. Thus, to test the contribution of other Sirtuins, the experiment could be repeated using wild-type and Sirt2 KO mice. We are willing to conduct the AGK2 experiment using JEV-infected wild-type and Sirt2 knockout mice.

    1. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. Author response:

      We thank the editors and the reviewers for their positive comments regarding our manuscript and the methodological approach we have taken to understand the historical demographic response of endemic island birds to climate change. We acknowledge the issues of uneven sample sizes and plan to include additional species of island endemic birds for which genomic data is now available. As requested by reviewer 1, we will also address the issues related to the PSMC analysis in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      For clarity, the methods would benefit from further detail of task framing to participants. I.e. were there explicit instructions regarding volatility/task contingencies? Or were participants told nothing?

      We have added in the following explanatory text to the methods section (page 20), clarifying the limited instructions provided to participants:

      “Participants were informed that the task would be split into 6 blocks, that they had to learn which was the best option to choose, and that this option may change over time. They were not informed about the different forms of uncertainty we were investigating or of the underlying structure of the task (that uncertainty varied between blocks).”

      In the results, it would be useful to report the general task behavior of participants to get a sense of how they performed across different parts of the task. Also, were participants excluded if they didn't show evidence of learning adaptation to volatility?

      We have added the following text reporting overall performance to the results (page 6):

      “Participants were able to learn the best option to choose in the task, selecting the most highly rewarded option on an average of 71% of trials (range 65% - 74%).”

      And the following text to the methods, confirming that participants were not excluded if they didn’t respond to volatility/noise (the failure in this adaptation is the focus of the current study) (page 19):

      “No exclusion criteria related to task performance were used.”

      The results would benefit from a more intuitive explanation of what the lesioning is trying to recapitulate; this can get quite technical and the objective is not necessarily clear, especially for the less computationally-minded reader.

      We have amended the relevant section of the results to clarify this point (page 9):

      “Having shown that an optimal learner adjusts its learning rate to changes in volatility and noise as expected, we next sought to understand the relative noise insensitivity of participants. In these analyses we “lesion” the BOM, to reduce its performance in some way, and then assess whether doing so recapitulates the pattern of learning rate adaptation observed for participants (Fig 3e). In other words, we damage the model so it performs less well and then assess whether this damage makes the behaviour of the BOM (shown in Fig 3f) more closely resemble that seen in participants (Fig 3e).”

      The modelling might be improved by the inclusion of another class of model. Specifically, models that adapt learning rates in response to the estimation of latent states underlying the current task outcomes would be very interesting to see. In a sense, these are also estimating volatility through changeability of latent states, and it would be interesting to explore whether the findings could also be explained by an incorrect assumption that the latent state has changed when outcomes are noisy.

      Thank you for this suggestion. We have added additional sections to the supplementary materials in which we use a general latent state model and a simple RL model to try to recapitulate the behaviour of participants (and to compare with the BOM). These additional sections are extensive, so are not reproduced here. We have also added in a section to the discussion in the main paper covering this interesting question in which we confirm that we were unable to reproduce participant behaviour (or the normative effect of the lesioned BOMs) using these models but suggest that alternative latent state formulations would be interesting to explore in future work (page 18):

      “A related question is whether other, non-Bayesian model formulations may be able to account for participants’ learning adaptation in response to volatility and noise. Of note, the reinforcement learning model used to measure learning rates in separate blocks does not achieve this goal—as this model is fitted separately to each block rather than adapting between blocks (NB the simple reinforcement learning model that is fitted across all blocks does not capture participant behaviour, see supplementary information). One candidate class of model that has potential here is latent-state models (Cochran & Cisler, 2019), in which the variance and unexpected changes in the process being learned (which have a degree of similarity with noise and volatility respectively) is estimated and used to alter the model’s rates of updating as well as the estimated number of states being considered. Using the model described by Cochran and Cisler, we were unable to replicate the learning rate adaptation demonstrated by participants in the current study (see supplementary information) although it remains possible that other latent state formulations may be more successful. “

      The discussion may benefit from a little more discussion of where this work leads us - what is the next step?

      As above, we have added in a suggestion about future modelling work. We have also added in a section about the outstanding interesting questions concerning the neural representation of these quantities, reproduced in response to the suggestion by reviewer #2 below.

      Reviewer #2 (Recommendations for the authors):

      The study presents an opportunity to explore potential neural coding models that could account for the cognitive processes underlying the task. In the field of neural coding, noise correlation is often measured to understand how a population of neurons responds to the same stimulus, which could be related to the noise signal in this task. Since the brain likely treats the stimulus as the same, with noise representing minor changes, this aspect could be linked to the participants' difficulty distinguishing noise from volatility. On the other hand, signal correlation is used to understand how neurons respond to different stimuli, which can be mapped to the volatility signal in the task. It would be highly beneficial if the authors could discuss how these established concepts from neural population coding might relate to the Bayesian behavior model used in the study. For instance, how might neurons encode the distinction between noise and volatility at a population level? Could noise correlation lead to the misattribution of noise as volatility at a neural level, mirroring the behavioral findings? Discussing possible neural models that could explain the observed behavior and relating it to the existing literature on neural population coding would significantly enrich the discussion. It would also open up avenues for future research, linking these behavioral findings to potential neural mechanisms.

      We thank the reviewer for this interesting suggestion. We have added in the following paragraph to the discussion section which we hope does justice to this interesting questions (page 18):

      Previous work examining the neural representations of uncertainty have tended to report correlations between brain activity and some task-based estimate of one form of uncertainty at a time (Behrens et al., 2007; Walker et al., 2020, 2023). We are not aware of work that has, for example, systematically varied volatility and noise and reported distinct correlations for each. An interesting possibility as to how different forms of uncertainty may be encoded is suggested by parallels with the neuronal decoding literature. One question addressed by this literature is how the brain decodes changes in the world from the distributed, noisy neural responses to those changes, with a particular focus on the influence of different forms of between-neuron correlation (Averbeck et al., 2006; Kohn et al., 2016). Specifically, signal-correlation, the degree to which different neurons represent similar external quantities (required to track volatility) is distinguished from, and often limited by, noise-correlation, the degree to which the activity of different neurons covaries independently of these external quantities. One possibility relevant to the current study, which resembles the underlying logic of the BOM, is that a population of neurons represents the estimated mean of the generative process that produces task outcomes. In this case, volatility would be tracked as the signal-correlation across this population, whereas noise would be analogous to the noise-correlation and, crucially, misestimation of noise as volatility might arise as misestimation of these two forms of correlation. While the current study clearly cannot adjudicate on the neural representation of these processes, our finding of distinct behavioural and physiological responses to the two forms of uncertainty, does suggest that separable neural representations of uncertainty are maintained. “

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full-length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non-canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. The methodology is of general use for other proteins. The authors also show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state, relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

      Reviewer #2 (Public review):

      The manuscript "Domain Coupling in Allosteric Regulation of SthK Measured Using Time-Resolved Transition Metal Ion FRET" by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

      The results summarize known conformational transitions of the C-helix and provide distance distributions that agree with predicted values based on available structures. The authors first validated their TCSPC approach using the isolated CNBD construct previously employed for similar experiments. They then study the more complex fulllength SthK channel protein. The findings agree with earlier results from this group, demonstrating that the C-helix is more mobile in the closed state than static structures reflect. Upon adding the activating ligand cAMP, the C-helix moves closer to the bound ligand, as indicated by a reduced fluorescence lifetime, suggesting a shorter distance between the donor and acceptor. The observed effects depend on the cAMP concentration, with affinities comparable to functional measurements. Interestingly, a substantial amount of CNBDs appear to be in the activated state even in the absence of cAMP (Figure 6E and F, fA2 ~ 0.4).

      This may be attributed to cooperativity among the CNBDs, which the authors could elaborate on further. In this context, the major limitation of this study is that distance distributions are observed only in one domain. While inter-subunit FRET is detected and accounted for, the results focus exclusively on movements within one domain. Thus, the resulting energetic considerations must be assessed with caution. In the absence of the activator, the closed state is favored, while the presence of cAMP favors the open state. This quantifies the standard assumption; otherwise, an activator would not effectively activate the channel. However, the numerical values of approximately 3 kcal/mol are limited by the fact that only one domain is observed in the experiment, and only one distance (C- helix relative to the CNBD) is probed. Additional conformational changes leading to pore opening (including rotation and upward movement of the CNBD, and radial dilation of the tetrameric assembly) are not captured by the current experiments. These limitations should be taken into account when interpreting the results.

      We agree that these are important limitations to consider in interpreting our results. These limitations and future directions are now largely covered in our discussion. We believe measurements in individual domains provide unique insights into the contributions of different parts of the protein and future work will continue to address conformational energetics in other parts of the protein and subunit cooperativity. 

      Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel.

      The experiments were performed on an isolated C-terminal nucleotide binding domain

      (CNBD) and on a purified full-length channel, with FRET partners placed at two

      positions in the CNBD.

      Strengths:

      The data and quantitative analysis are exemplary, and they provide a roadmap for use of this powerful approach in other proteins.

      Weaknesses/Comments:

      A ~3x lower Kd for nucleotide is seen for the detergent-solubilized full-length channel, compared to electrophysiological experiments. This is worth a comment in the Discussion, particularly in the context of the effect of the pore domain on the CNBD energetics.

      We are cautious to interpret our K<sub>D</sub> values given the high affinity for cAMP and the challenges of accurately determining the total protein concentrations in our experiments. We now state this explicitly in the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript is very well written and clear. Congrats to the authors.

      Minor comment: In "Measuring tmFRET in Full-Length SthK", 3rd paragraph: "... FRET model with both intersubunit and intersubunit FRET." Should read "intersubunit and intrasubunit".

      Thank you for the comment, this is now corrected.  

      Reviewer #2 (Recommendations for the authors):

      Overall, the manuscript is well-written and clearly explained. However, I recommend that the authors discuss the limitations more critically.

      The revised manuscript now largely addresses these limitations. Additional comments are addressed in short below:  

      A) Only one distance is measured.

      We believe validating a single distance as an important first step in determining the use of this technique and beginning to quantify the allosteric mechanism in SthK. Future studies aim to make additional measurements.

      B) Measurements are confined to a single domain in the cooperative tetrameric assembly.

      Isolating conformational changes in individual domains, allows us to determine how different parts of the protein contribute to the activation upon ligand binding.  

      C) The change in distance upon activation mirrors what is observed in the closed state, which casts doubt on whether these conformational changes actually lead to channel opening or merely reflect the upward swinging of the C-helix that contributes to coordinating cAMP in the binding pocket.

      Future studies aim to detect conformational changes in the pore and other parts of the protein.

      D) Rigid body movements, rotations, and dilations are not captured by the measurements. 

      Our measurements combine energetic information with some, although more limited, structural information.   

      E) Cooperativity is not considered in the interpretation of the results.

      It is currently unclear where in SthK cooperativity arises upon ligand activation (ie. at the level of the CNBD, C-Linker or pore). Our results do not provide evidence of cooperativity in the CNBD upon ligand binding. 

      Additionally, the authors directly correlate their results with the functional states of SthK previously reported, but it remains open whether the modified protein for tmFRET behaves similarly to WT SthK. Functional experiments with the protein used for tmFRET, which demonstrate comparable open probabilities and cAMP potency, would considerably strengthen the manuscript.

      Further optimization is needed to express the full-length protein used in tmFRET experiments in spheroplasts to enable electrophysiological recordings from these constructs. 

      Reviewer #3 (Recommendations for the authors):

      In the final paragraph of the Discussion, the sentence "In our experiments, we assumed that deleting the pore and transmembrane domains eliminates the coupling of these regions to the CNBD" seems trivial. Perhaps it would help to add "simply" before eliminates?

      We have taken the advice and added ‘simply’ in this sentence.  

      Can a statement be made about the magnitude of the effect in the C-terminal deletion experiments in refs 27-29?

      Due to the different channels used in the C-terminal deletion experiments in refs 27-29 (HCN1 and spHCN), compared to the channel we used (SthK), it is challenging to compare the magnitude of energetic changes between these studies. Additionally, the HCN experiments measured changes in the pore domain, compared to the conformational changes in the CNBD domain measured here.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.

      Strengths:

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals.

      Weaknesses:

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1.

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.

      Reviewer #2 (Public review):

      Summary:

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact.

      Strengths:

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented.

      Weaknesses:

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success.

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

      Recommendations for the authors:

      Editorial note:

      In the discussion, the reviewers agreed that the present manuscript does not make a sufficient independent contribution and so would be more profitably combined with the companion manuscript. Both reviewers noted that there was not much insight that relied on the single figure. Since neither manuscript is long, and they have overlapping authors (including the same first and last authors), this should not be a difficult merger to achieve.

      Thank you for the recommendation to merge. We have combined both manuscripts into one in this version.

      Reviewer #1 (Recommendations for the authors):

      The jargon of the grant programs could be described as a nightmare. Wellcome is spelled wrong.

      We have attempted to limit the use of jargon and to define acronyms in this version. We have corrected the spelling of Wellcome.

      Reviewer #2 (Recommendations for the authors):

      I suggest that the two manuscripts be combined into a single paper. Although the other manuscript could stand on its own, this one does not.

      The idea of culture change surrounding teams is useful but really forms more of a policy- focused opinion piece than a quantitative analysis of funding impact.

      If the authors insist on keeping these separate, it is critical to remove the team data from the other manuscript.

      We have combined both manuscripts and decided to retain the description of culture change but have edited and condensed this section and will use the supplemental report for qualitative assessments.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis.

      Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We will further analyze the mutations on the available PHD2 crystal structures in complex with HIFa to discern how these substitution mutations may impact PHD2 structure and function.  

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed.  We will perform additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph will be changed for a clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  We will expand our discussion accordingly. 

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had Kd of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. We will perform additional binding experiments to further interrogate and validate the binding affinity of PHD2 P317R to NODD and CODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K<sub>d</sub>’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  However, we will perform additional binding experiments to further interrogate PHD2 P317R binding to NODD.   

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We will expand on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data will be added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we will correct and clarify our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data will be added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells will be added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We will perform additional experiments as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants will be added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis. 

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2. The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations. The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter. Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants. The limitations of the luciferase assay will be expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif. While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD. We will discuss this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study. We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog. 

      However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

      Thank you for your comments and for pointing out the strengths of the manuscript. Unfortunately, we will not be able to respond to point #1. The protocol for purification of the ribosomes from RNA granules does not work in older brains (See Khandjian et al, 2004 PNAS 101:13357), presumably due to the presence of large concentrations of myelin. While it would be possible to repeat our results later in culture, we have no expectation that it would be different since we do observe DHPG induction of elongation dependent, initiation independent mGLUR-LTD in later cultures (Graber et al, 2017 J. Neuroscience 37:9116)..We will strengthen this caveat in the discussion that our results are only at a snapshot of development and that it is certainly possible that different results may be seen at different times. We agree with point 2 that ‘distal granules’ is a vague term. We will remove the term and clarify that we only quantified granules larger than 50 microns from the cell soma. We do not know if these granules are distinct. We would respectfully disagree with point #3 that the study does not provide molecular insight into the function of FMRP, as disproving that FMRP is important for stalling and determining the position of stalling removes a major hypothesis about the function of FMRP, and showing that something is not true, is at least to me, providing insight.

      Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

      Thank you for your comments. We agree with the issue in point #1 that the equivalence of RPM puncta with the RG fraction is an issue and while we believe that we show in a number of ways that the two are related (anisomycin-resistant puromycylation, puromyclation only at high concentrations consistent with the hybrid state, etc), we would respectfully disagree that our main message results from the equivalence of the RPM-labeled RNA granules in neurites and the ribosomes isolated by sedimentation. We will make this point clearer in our revision. For point #2, we agree that the changes with increased nuclease is somewhat out of place in a narrative sense, but it is clearly relevant to this work. Whether or not one sees this as a ‘correction’ or an interesting point will depend on a better characterization of the structures of the stalled polysomes. My personal view is that the nuclease resistance of cleavage near the RNA entrance site is quite interesting. Since we reproduce our results with a similar nuclease treatment in mice, as reported in our previous publication, I believe the comparison could be of interest in the future and would like to retain it. We agree with point #3 and will temper these claims in our revised version. For point #4, we will determine more carefully why the number of peaks differs and switch the main and supplemental figures. We apologize for the typo in the figure legend in Figure 9, 171, not 5171. The box plot line shows the median not the average and the data is clearly skewed such that the median and average are different (i.e. there is a two-fold decrease in the average density of distal puncta between WT and FMRP, but the average density is actually slightly decreased with HHT and A, although the median increases slightly. We will now report the results in distinct modalities to clarify this, and we will reexamine the statistics to better address the skewed distribution of values in the revised version.

      Summary:

      Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      (1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      (2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      (3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      (4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      (1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      (2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

      Thank you for your comments on the strengths of the manuscript. We agree with point #1 that the mouse RNA granule characterization needs to be more rigorous and we plan to accomplish this in our revised version. Similarly, we will incorporate the additional statistical analysis suggested by the reviewer in a revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies that have put forward the concept of an ’inverse RF’ based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation. Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feedforward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The statistics are pooled across animals, which is less appropriate for hierarchical data. There is no histological confirmation of placement of the electrode in the LGN and there is no analysis of eye or face movements which may have contributed to the surround-induced responses. There are also some missing statistics and methods details which make interpretation more difficult.

      We thank the reviewer for their positive and constructive comments, and have addressed these specific issues in response to the minor comments. For the statistics across animals, we refer to “Reviewer 1 recommendations” point 1. For the histological analysis, we refer to “Reviewer 1 recommendations point 2”. For the eye and facial movements, we refer to “Reviewer 1 recommendations point 5”. Concerning missing statistics and methods details, we refer to various responses to “Reviewer 1 recommendations”. We thoroughly reviewed the manuscript and included all missing statistical and methodological details.

      Reviewer #2 (Public review):

      Cuevas et al. investigate the stimulus selectivity of surround-induced responses in the mouse primary visual cortex (V1). While classical experiments in non-human primates and cats have generally demonstrated that stimuli in the surround receptive field (RF) of V1 neurons only modulate activity to stimuli presented in the center RF, without eliciting responses when presented in isolation, recent studies in mouse V1 have indicated the presence of purely surround-induced responses. These have been linked to prediction error signals. In this study, the authors build on these previous findings by systematically examining the stimulus selectivity of surround-induced responses.

      Using neuropixels recordings in V1 and the dorsal lateral geniculate nucleus (dLGN) of head-fixed, awake mice, the authors presented various stimulus types (gratings, noise, surfaces) to the center and surround, as well as to the surround only, while also varying the size of the stimuli. Their results confirm the existence of surround-induced responses in mouse V1 neurons, demonstrating that these responses do not require spatial or temporal coherence across the surround, as would be expected if they were linked to prediction error signals. Instead, they suggest that surround-induced responses primarily reflect the representation of the achromatic surface itself.

      The literature on center-surround effects in V1 is extensive and sometimes confusing, likely due to the use of different species, stimulus configurations, contrast levels, and stimulus sizes across different studies. It is plausible that surround modulation serves multiple functions depending on these parameters. Within this context, the study by Cuevas et al. makes a significant contribution by exploring the relationship between surround-induced responses in mouse V1 and stimulus statistics. The research is meticulously conducted and incorporates a wide range of experimental stimulus conditions, providing valuable new insights regarding center-surround interactions.

      However, the current manuscript presents challenges in readability for both non-experts and experts. Some conclusions are difficult to follow or not clearly justified.

      I recommend the following improvements to enhance clarity and comprehension:

      (1) Clearly state the hypotheses being tested at the beginning of the manuscript.

      (2) Always specify the species used in referenced studies to avoid confusion (esp. Introduction and Discussion).

      (3) Briefly summarize the main findings at the beginning of each section to provide context.

      (4) Clearly define important terms such as “surface stimulus” and “early vs. late stimulus period” to ensure understanding.

      (5) Provide a rationale for each result section, explaining the significance of the findings.

      (6) Offer a detailed explanation of why the results do not support the prediction error signal hypothesis but instead suggest an encoding of the achromatic surface.

      These adjustments will help make the manuscript more accessible and its conclusions more compelling.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      • We explicitly mentioned the species used in the referenced studies.

      • We provided a clearer rationale for each experiment in the Results section.

      We have also always clearly stated the species that previous studies used, both in the Introduction and Discussion section.

      Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually the opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      The paper is not particularly clear. I came out of it rather confused as to which hypotheses were still standing and which hypotheses were ruled out. There are numerous ways to make it clearer.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      ** Recommendations for the Authors:**

      Reviewer #1 (Recommendations for the Authors):

      (1) Given the data is hierarchical with neurons clustered within 6 mice (how many recording sessions per animal?) I would recommend the use of Linear Mixed Effects models. Simply pooling all neurons increases the risk of false alarms.

      To clarify: We used the standard method for analyzing single-unit recordings, by comparing the responses of a population of single neurons between two different conditions. This means that the responses of each single neuron were measured in the different conditions, and the statistics were therefore based on the pairwise differences computed for each neuron separately. This is a common and standard procedure in systems neuroscience, and was also used in the previous studies on this topic (Keller et al., 2020; Kirchberger et al., 2023). We were not concerned with comparing two groups of animals, for which hierarchical analyses are recommended. To address the reviewer’s concern, we did examine whether differences between baseline and the gray/drift condition, as well as the gray/drift compared to the grating condition, were consistent across sessions, which was indeed the case. These findings are presented in Supplementary Figure 6.

      (2) Line 432: “The study utilized three to eight-month-old mice of both genders”. This is confusing, I assume they mean six mice in total, please restate. What about the LGN recordings, were these done in the same mice? Can the authors please clarify how many animals, how many total units, how many included units, how many recording sessions per animal, and whether the same units were recorded in all experiments?

      We have now clarified the information regarding the animals used in the Methods section.

      • We state that “We included female and male mice (C57BL/6), a total of six animals for V1 recordings between three and eight months old. In two of those animals, we recorded simultaneously from LGN and V1.”

      • We state that“For each animal, we recorded around 2-3 sessions from each hemisphere, and we recorded from both hemispheres.”

      • We noted that the number of neurons was not mentioned for each figure caption. We apologize for this omission. We have now added the number for all of the figures and protocols to the revised manuscript. We note that the same neurons were recorded for the different conditions within each protocol, however because a few sessions were short we recorded more units for the grating protocol. Note that we did not make statistical comparisons between protocols.

      (3) I see no histology for confirmation of placement of the electrode in the LGN, how can they be sure they were recording from the LGN? There is also little description of the LGN experiments in the methods.

      For better clarity, we have included a reconstruction of the electrode track from histological sections of one animal post-experiment (Figure S4). The LGN was targeted via stereotactical surgery, and the visual responses in this area are highly distinct. In addition, we used a flash protocol to identify the early-latency responses typical for the LGN, which is described in the Methods section: “A flash stimulus was employed to confirm the locations of LGN at the beginning of the recording sessions, similar to our previous work in which we recorded from LGN and V1 simultaneously (Schneider et al., 2023). This stimulus consisted of a 100 ms white screen and a 2 s gray screen as the inter-stimulus interval, designed to identify visually responsive areas. The responses of multi-unit activity (MUA) to the flash stimulus were extracted and a CSD analysis was then performed on the MUA, sampling every two channels. The resulting CSD profiles were plotted to identify channels corresponding to the LGN. During LGN recordings, simultaneous recordings were made from V1, revealing visually responsive areas interspersed with non-responsive channels.”

      (4) Many statements are not backed up by statistics, for example, each time the authors report that the response at 90degree sign is higher than baseline (Line 121 amongst other places) there is no test to support this. Also Line 140 (negative correlation), Line 145, Line 180.

      For comparison purposes, we only presented statistical analyses across conditions. However, we have now added information to the figure captions stating that all conditions show values higher than the baseline.

      (5) As far as I can see there is no analysis of eye movements or facial movements. This could be an issue, for example, if the onset of the far surround stimuli induces movements this may lead to spurious activations in V1 that would be interpreted as surround-induced responses.

      To address this point, we have included a supplementary figure analyzing facial movements across different sessions and comparing them between conditions (Supplementary Figure 5). A detailed explanation of this analysis has been added to the Methods section. Overall, we observed no significant differences in face movements between trials with gratings, trials with the gray patch, and trials with the gray screen presented during baseline. Animals exhibited similar face movements across all three conditions, supporting the conclusion that the observed neural firing rate increases for the gray-patch condition are not related to face movements.

      (6) The experiments with the rectangular patch (Figure 3) seem to give a slightly different result as the responses for large sizes (75, 90) don’t appear to be above baseline. This condition is also perceptually the least consistent with a grey surface in the RF, the grey patch doesn’t appear to occlude the surface in this condition. I think this is largely consistent with their conclusions and it could merit some discussion in the results/discussion section.

      While the effect is maybe a bit weaker, the total surround stimulated also covers a smaller area because of the large rectangular gray patch. Furthermore, the early responses are clearly elevated above baseline, and the responses up to 70 degrees are still higher than baseline. Hence we think this data point for 90 degrees does not warrant a strong interpretation.

      Minor points:

      (1) Figure 1h: What is the statistical test reported in the panel (I guess a signed rank based on later figures)? Figure 4d doesn’t appear to be significantly different but is reported as so. Perhaps the median can be indicated on the distribution?

      We explained that we used a signed rank test for Figure 1h and now included the median of the distributions in Figure 4d.

      (2) What was the reason for having the gratings only extend to half the x-axis of the screen, rather than being full-screen? This creates a percept (in humans at least) that is more consistent with the grey patch being a hole in the grating as the grey patch has the same luminance as the background outside the grating.

      We explained in the Methods section that “We presented only half of the x-axis due to the large size of our monitor, in order to avoid over-stimulation of the animals with very large grating stimuli.”. Perceptually speaking, the gray patch appears as something occluding the grating, not as a “hole”.

      (3) Line 103: “and, importantly, had less than 10degree sign (absolute) distance to the grating stimulus’ RF center.” Re-phrase, a stimulus doesn’t have an RF center.

      We corrected this to “We included only single units into the analysis that met several criteria in terms of visual responses (see Methods) and, importantly, the RF center had less than 10(absolute) distance to the grating stimulus’ center. ”.

      (4) Line 143: “We recorded single neurons LGN” - should be “single LGN neurons”.

      We corrected this to “we recorded single LGN neurons”.

      (5) Line 200: They could spell out here that the latency is consistent with the latency observed for the grey patch conditions in the previous experiments. (6) Line 465: This is very brief. What criteria did they use for single-unit assignation? Were all units well-isolated or were multi-units included?

      We clarified in the Methods section that “We isolated single units with Kilosort 2.5 (Steinmetz et al., 2021) and manually curated them with Phy2 (Rossant et al., 2021). We included only single units with a maximum contamination of 10 percent.”

      (7) Line 469: “The experiment was run on a Windows 10”. Typo.

      We corrected this to “The experiment was run on Windows 10”.

      (9) Line 481: “We averaged the response over all trials and positions of the screen”. What do they mean by ’positions of the screen’?

      We changed this to “We computed the response for each position separately right, by averaging the response across all the trials where a square was presented at a given position.”

      (9) Line 483: “We fitted an ellipse in the center of the response”. How?

      We additionally explain how we preferred the detection of the RF using an ellipse fitting: “A heatmap of the response was computed. This heatmap was then smoothed, and we calculated the location of the peak response. From the heatmap we calculated the centroid of the response using the function regionprops.m that finds unique objects, we then selected the biggest area detected. Using the centroids provided as output. We then fitted an ellipse centered on this peak response location to the smoothed heatmap using the MATLAB function ellipse.m.“

      (10) Line 485 “...and positioned the stimulus at the response peak previously found”. Unclear wording, do you mean the center of the ellipse fit to the MUA response averaged across channels or something else? (11) Line 487: “We performed a permutation test of the responses inside the RF detected vs a circle from the same area where the screen was gray for the same trials.”. The wording is a bit unclear here, can they clarify what they mean by the ’same trials’, what is being compared to what here?

      We used a permutation test to compare the neuron’s responses to black and white squares inside the RF to the condition where there was no square in the RF (i.e. the RF was covered by the gray background).

      (12) Was the pink noise background regenerated on each trial or as the same noise pattern shown on each trial?

      We explain that “We randomly presented one of two different pink noise images”

      (13) Line 552: “...used a time window of the Gaussian smoothing kernel from-.05 to .05”. Missing units.

      We explained that “we used a time window of the Gaussian smoothing kernel from -.05 s to .05 s, with a standard deviation of 0.0125 s.”

      (14) Line 565: “Additionally, for the occluded stimulus, we included patch sizes of 70 degree sign and larger.”. Not sure what they’re referring to here.

      We changed this to: “For the population analyses, we analyzed the conditions in which the gray patch sizes were 70 degrees and 90 degrees”.

      (15) Line 569: What is perplexity, and how does changing it affect the t-SNE embeddings?

      Note that t-SNE is only used for visualization purposes. In the revised manuscript, we have expanded our explanation regarding the use of t-SNE and the choice of perplexity values. Specifically, we have clarified that we used a perplexity value of 20 for the Gratings with circular and rectangular occluders and 100 for the black-and-white condition. These values were empirically selected to ensure that the groups in the data were clearly separable while maintaining the balance between local and global relationships in the projected space. This choice allowed us to visually distinguish the different groups while preserving the meaningful structure encoded in the dissimilarity matrices. In particular, varying the perplexity values would not alter the conclusions drawn from the visualization, as t-SNE does not affect the underlying analytical steps of our study.

      (16) Line 572: “We trained a C-Support Vector Classifier based on dissimilarity matrices”. This is overly brief, please describe the construction of the dissimilarity matrices and how the training was implemented. Was this binary, multi-class? What conditions were compared exactly?

      In the revised manuscript, we have expanded our explanation regarding the construction of the dissimilarity matrices and the implementation of the C-Support Vector Classification (C-SVC) model (See Methods section).

      The dissimilarity matrices were calculated using the Euclidean distance between firing rate vectors for all pairs of trials (as shown in Figure 6a-b). These matrices were used directly as input for the classifier. It is important to note that t-SNE was not used for classification but only for visualization purposes. The classifier was binary, distinguishing between two classes (e.g., Dr vs St). We trained the model using 60% of the data for training and used 40% for testing. The C-SVC was implemented using sklearn, and the classification score corresponds to the average accuracy across 20 repetitions.

      Reviewer #2 (Recommendations for the Authors):

      The relationship between the current paper and Keller et al. is challenging to understand. It seems like the study is critiquing the previous study but rather implicitly and not directly. I would suggest either directly stating the criticism or presenting the current study as a follow-up investigation that further explores the observed effect or provides an alternative function. Additionally, defining the inverse RF versus surround-induced responses earlier than in the discussion would be beneficial. Some suggestions:

      (1) The introduction is well-written, but it would be helpful to clearly define the hypotheses regarding the function of surround-induced responses and revisit these hypotheses one by one in the results section.

      Indeed, we have generally improved the Introduction of the manuscript, and stated the hypotheses and their relationships to the Experiments more clearly.

      (2) Explicitly mention how you compare classic grating stimuli of varying sizes with gray patch stimuli. Do the patch stimuli all come with a full-field grating? For the full-field grating, you have one size parameter, while for the patch stimuli, you have two (size of the patch and size of the grating).

      We now clearly describe how we compare grating stimuli of varying sizes with gray patch stimuli.

      (3) The third paragraph in the introduction reads more like a discussion and might be better placed there.

      We have moved content from the third paragraph of the Introduction to the Discussion, where it fits more naturally.

      (4) Include 1-2 sentences explaining how you center RFs and detail the resolution of your method.

      We have added an explanation to the Methods: “To center the visual stimuli during the recording session, we averaged the multiunit activity across the responsive channels and positioned the stimulus at the center of the ellipse fit to the MUA response averaged across channels.”.

      (5) Motivate the use of achromatic stimuli. This section is generally quite hard to understand, so try to simplify it.

      We explained better in the Introduction why we performed this particular experiment.

      (6) The decoding analysis is great, but it is somewhat difficult to understand the most important results. Consider summarizing the key findings at the beginning of this section.

      We now provide a clearer motivation at the start of the Decoding section.

      Reviewer #3 (Recommendations for the Authors):

      I have a few suggestions to improve the clarity of the presentation.

      Abstract: it lists a series of observations and it ends with a conclusion (“based on these findings...”). However, it provides little explanation for how this conclusion would arise from the observations. It would be more helpful to introduce the reasoning at the top and show what is consistent with it.

      We have improved the abstract of the paper incorporating this feedback.

      To some extent, this applies to Results too. Sometimes we are shown the results of some experiment just because others have done a similar experiment. Would it be better to tell us which hypotheses it tests and whether the results are consistent with all 3 hypotheses or might rule one or more out? I came out of the paper rather confused as to which hypotheses were still standing and which hypotheses were ruled out.

      We have strongly improved our explanation of the hypotheses and the relationships to the experiments in the Introduction.

      It would be best if the Results section focused on the results of the study, without much emphasis on what previous studies did or did not measure. Here, instead, in the middle of Results we are told multiple times what Keller et al. (2020) did or did not measure, and what they did or did not find. Please focus on the questions and on the results. Where they agree or disagree with previous papers, tell us briefly that this is the case.

      We have revised the Results section in the revised manuscript, and ensured that there is much less focus on what previous studies did in the Results. Differences to previous work are now discussed in the Discussion section.

      The notation is extremely awkward. For instance “Gc” stands for two words (Gray center) but “Gr” stands for a single word (Grating). The double meaning of G is one of many sources of confusion.

      This notation needs to be revised. Here is one way to make it simpler: choose one word for each type of stimulus (e.g. Gray, White, Black, Drift, Stat, Noise) and use it without abbreviations. To indicate the configuration, combine two of those words (e.g. Gray/Drift for Gray in the center and Drift in the surround).

      We have corrected the notation in the figures and text to enhance readability and improve the reader’s understanding.

      Figure 1e and many subsequent ones: it is not clear why the firing rate is shown in a logarithmic scale. Why not show it in a linear scale? Anyway, if the logarithmic scale is preferred for some reason, then please give us ticks at numbers that we can interpret, like 0.1,1,10,100... or 0.5,1,2,4... Also, please use the same y-scale across figures so we can compare.

      To clarify: it is necessary to normalize the firing rates relative to baseline, in order to pool across neurons. However such a divisive normalization would be by itself problematic, as e.g. a change from 1 to 2 is the same as a change from 1 to 0.5, on a linear scale. Furthermore such division is highly outlier sensitive. For this reason taking the logarithm (base 10) of the ratio is an appropriate transformation. We changed the tick labels to 1, 2, 4 like the reviewer suggested.

      Figure 3: it is not clear what “size” refers to in the stimuli where there is no gray center. Is it the horizontal size of the overall stimulus? Some cartoons might help. Or just some words to explain.

      Figure 3: if my understanding of “size” above is correct, the results are remarkable: there is no effect whatsoever of replacing the center stimulus with a gray rectangle. Shouldn’t this be remarked upon?

      We have added a paragraph under figure 3 and in the Methods section explaining that the sizes represent the varying horizontal dimensions of the rectangular patch. In this protocol, the classical condition (i.e. without gray patch) was shown only as full-field gratings, which is depicted in the plot as size 0, indicating no rectangular patch was present.

      DETAILS The word “achromatic” appears many times in the paper and is essentially uninformative (all stimuli in this study are achromatic, including the gratings). It could be removed in most places except a few, where it is actually used to mean “uniform”. In those cases, it should be replaced by “uniform”.

      Ditto for the word “luminous”, which appears twice and has no apparent meaning. Please replace it with “uniform”.

      We have replaced the words achromatic and luminous with “uniform” stimuli to improve the clarity when we refer to only black or white stimuli.

      Page 3, line 70: “We raise some important factors to consider when describing responses to only surround stimulation.” This sentence might belong in the Discussion but not in the middle of a paragraph of Results.

      We removed this sentence.

      Neuropixel - Neuropixels (plural)

      “area LGN” - LGN

      We corrected for misspellings.

      References

      Keller, A.J., Roth, M.M., Scanziani, M., 2020. Feedback generates a second receptive field in neurons of the visual cortex. Nature 582, 545–549. doi:10.1038/s41586-020-2319-4.

      Kirchberger, L., Mukherjee, S., Self, M.W., Roelfsema, P.R., 2023. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science Advances 9, eadd2498. doi:10. 1126/sciadv.add2498.

      Rossant, C., et al., 2021. phy: Interactive analysis of large-scale electrophysiological data. https://github.com/cortex-lab/phy.

      Schneider, M., Tzanou, A., Uran, C., Vinck, M., 2023. Cell-type-specific propagation of visual flicker. Cell Reports 42.

      Steinmetz, N.A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M., Beau, M., Bhagat, J., B¨ohm, C., Broux, M., Chen, S., Colonell, J., Gardner, R.J., Karsh, B., Kloosterman, F., Kostadinov, D., Mora-Lopez, C., O’Callaghan, J., Park, J., Putzeys, J., Sauerbrei, B., van Daal,R.J.J., Vollan, A.Z., Wang, S., Welkenhuysen, M., Ye, Z., Dudman, J.T., Dutta, B., Hantman, A.W., Harris, K.D., Lee, A.K., Moser, E.I., O’Keefe, J., Renart, A., Svoboda, K., H¨ausser, M., Haesler, S., Carandini, M., Harris, T.D., 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588. doi:10.1126/science.abf4588.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Thank you for the positive assessment of our work and the nice summary of the manuscript!

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

      We agree that experimental manipulation would allow us to extend our work. Unfortunately, this is not possible with wild, endangered gorillas.

      We have now added more references (Watts 1994; Watts 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influence the likelihood to receive aggression.

      We have now clearly stated that reproductive state is an indirect proxy for energetic needs. We agree with your point about energy intake and expenditure, but unfortunately, we do not have data on energy expenditure or caloric intake to allow us to delve into more fine-grained analyses.

      Overall, we have tried to enrich the justification and empirical support to strengthen our conclusions by clarifying the text and adding more examples and references.

      Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      We did not use group size as a main predictor, as has been commonly done in other species, because of potentially conflating opposing effects of males and females. To further stress this point, we have specifically added in the introduction: “group size, the overall number of individuals in the group, might not be a good predictor of aggression heuristics, as it can conflate the effects of different kinds of individuals on aggression (see Smit & Robbins 2024 for an example of opposing effects of the number of females and number of males on female gorilla aggression).”

      We also “ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, [and] its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      We rephrased accordingly: “We used all avoidance and displacement interactions throughout the study period and we used the function elo.seq from R package EloRating to infer daily individual female Elo-scores”. We also clarified that “This method takes into account the temporal sequence of interactions and updates an individual’s Elo-scores each day the individual interacted with another...”

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      We have shown in Reference 25 (Smit & Robbins 2025) after Reference 11 (Smit & Robbins 2024) that females form highly stable hierarchies, and that dyadic dominance relationships are not influenced by dispersal or death of third individuals. Notably, new immigrant females usually start at and remain low ranking, without large fluctuations in rank. Therefore, the presence of any fluctuation periods have limited influence in the aggressive interactions in our study system.

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      In fact, the females involved in the present analysis and the analysis of Smit & Robbins 2025 are the same. Our present analysis is based on the hierarchies of Smit & Robbins 2025. Note that female gorillas disperse and occasionally immigrate to another study group. This is why some females may appear in the hierarchies of more than one group, giving the impression that there are more females involved in the analysis of Smit & Robbins 2025 (e.g. by counting the lines in the Elo-rating plots). We now specifically state that “We present these interactions and hierarchies in detail in Smit & Robbins 2025”, to clarify that the hierarchies are the same.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

      Thank you for pointing this out. First, when we considered one pregnancy stage, pregnant females showed indeed a significantly greater interaction score than females in other reproductive stages. We have now included that in the manuscript. However, we still find relevant to test for the different stages of pregnancy, given the difference of energetic needs in these stages. We have now included the pairwise comparisons in a new table (Table 2).

      Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically:

      • ATA: 6 years

      • BIT: 8 years

      • KYA: 18 years

      • MUK: 6 years

      • ORU: 8 years

      I recommend that the authors clarify how the 25-year duration was derived.

      Indeed none of the five study “groups” has been studied for 25 years in a row. However, MUK emerged from a fission of group KYA in early 2016. So, from the start of group KYA in October 1998 to the end of group MUK in December 2023, there are 25 years and 2 months. We have now rephrased to “...starting in 1998 in one of the mountain gorilla groups” in the introduction, and to “We use a long-term behavioural dataset on five wild groups of the two gorilla species, starting in 1998” in the abstract.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      We have now added the suggested extra test: “When we ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      Regarding species differences: In our analysis, we test for species (mountain vs western) and we find no significant differences between the two. This is stated in the results.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      We now added short explanations into brackets for behaviours that are not obvious. We also added a sentence in the text to clarify the difference with the behaviours used to calculate Elo scores: “These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

      The sentence we added above (“These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”) and the first paragraph of the results hopefully clarify that ritualized agonistic interactions are generally directionally consistent and more reliably capture the highly stable dominance relationships of female gorillas. This approach has been used to calculate dominance rank in gorillas in all studies that have considered it, dating back to the 1970s (namely in studies by Harcourt and Watts). On the other hand, aggression can be context dependent (we now clearly note that in the beginning of the Methods paragraph on aggressive interactions). Therefore, we use Eloscores inferred from ritualized interactions as base and a reliable proxy of power relationships; then we test if the direction of aggression within these relationships is driven also by energetic needs or the social environment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the molecular mechanisms underlying HIV-1 persistence and host immune dysfunction in CD4+ T cells during early infection (<6 months). Using single-cell multi-omics technologies-including scRNA-seq, scATAC-seq, and single-cell multiome analyses-they characterized the transcriptional and epigenomic landscapes of HIV-1-infected CD4+ T cells. They identified key transcription factors (TFs), signaling pathways, and T cell subtypes involved in HIV-1 persistence, particularly highlighting KLF2 and Th17 cells as critical regulators of immune suppression. The study provides new insights into immune dysregulation during early HIV-1 infection and reveals potential epigenetic regulatory mechanisms in HIV-1-infected T cells.

      Strengths:

      The study excels through its innovative integration of single-cell multi-omics technologies, enabling detailed analysis of gene regulatory networks in HIV-1-infected cells. Focusing on early infection stages, it fills a crucial knowledge gap in understanding initial immune responses and viral reservoir establishment. The identification of KLF2 as a key transcription factor and Th17 cells as major viral reservoirs, supported by comprehensive bioinformatics analyses, provides robust evidence for the study's conclusions. These findings have immediate clinical relevance by identifying potential therapeutic targets for HIV-1 reservoir eradication.

      We sincerely appreciate the reviewer’s positive evaluation of our work.

      Weaknesses:

      Despite its strengths, the study has several limitations. By focusing exclusively on CD4+ T cells, the study overlooks other relevant immune cells such as CD14+ monocytes, NK cells, and B cells. Additionally, while the authors generated their own single-cell datasets, they need to validate their findings using other publicly available single-cell data from HIV-1-infected PBMCs.

      Thank you to Reviewer #1 for your feedback on our work. In response to this feedback, we have examined cell-cell interactions between HIV-1-infected CD4+ T cells and other innate immune cells, including monocytes and NK cells. We identified altered interaction signaling patterns (e.g., MIF, ICAM2, CCL5, CLEC2B) that contribute to immune dysfunction and viral persistence (page 9, Supplementary Fig. 5) In addition, we validated the expression of KLF2 and its target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which includes both healthy donors and individuals with chronic HIV-1 infection. The upregulation of key KLF2 targets in HIV-1-infected CD4+ T cells from this dataset supports the reproducibility of our findings. We have incorporated into the revised Results, Discussion, and Supplementary Materials (page 8, page 12 and Supplementary Fig. 4A).

      Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      This key observation of up-regulation KLF2 associated genes family might be important in the HIV field for early diagnosis and viral clearance. However, with the limited sample size and in-vivo study model, it will be hard to conclude. I highly recommend increasing the sample size of early HIV-1-infected patients.

      Thank you to Reviewer #2 for this important comment. We acknowledge the limitations of our modest sample size, which reflects the challenges of recruiting well-characterized individuals in early HIV-1 infection (<6 months) and obtaining high-quality PBMCs for single-cell multi-omic profiling. To strengthen our findings, we validated the upregulation of KLF2 target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which showed similar expression patterns in HIV-1 RNA+ CD4+ T cells (page 8 and Supplementary Fig. 4A).

      Reviewer #3 (Public review):

      Summary:

      This manuscript studies intracellular changes and immune processes during early HIV-1 infection with an additional focus on the small CD4+ T cell subsets. The authors used single-cell omics to achieve high resolution of transcriptomic and epigenomic data on the infected cells which were verified by viral RNA expression. The results add to understanding of transcriptional regulation which may allow progression or HIV latency later in infected cells. The biosamples were derived from early HIV infection cases, providing particularly valuable data for the HIV research field.

      Strengths:

      The authors examined the heterogeneity of infected cells within CD4 T cell populations, identified a significant and unexpected difference between naive and effector CD4 T cells, and highlighted the differences in Th2 and Th17 cells. Multiple methods were used to show the role of the increased KLF2 factor in infected cells. This is a valuable finding of a new role for the major transcription factor in further disease progression and/or persistence.

      The methods employed by the authors are robust. Single-cell RNA-Seq from PBMC samples was followed by a comprehensive annotation of immune cell subsets, 16 in total. This manuscript presents to the scientific community a valuable multi-omics dataset of good quality, which could be further analyzed in the context of larger studies.

      We sincerely thank the reviewer for the insightful and concise summary of our work.

      Weaknesses:

      Methods and Supplementary materials

      Some technical aspects could be described in more detail. For example, it is unclear how the authors filtered out cells that did not pass quality control, such as doublets and cells with low transcript/UMI content. Next, in cell annotation, what is the variability in cell types between donors? This information is important to include in the supplementary materials, especially with such a small sample size. Without this, it is difficult to determine, whether the differences between subsets on transcriptomic level, viral RNA expression level, and chromatin assessment are observed due to cell type variations or individual patient-specific variations. For the DEG analysis, did the authors exclude the most variable genes?

      Thank you to Reviewer #3 for these detailed comments and observations. In the revised Methods section (page 16), we have added information on our quality control filtering process. Specifically, we excluded cells with fewer than 200 detected genes, high mitochondrial content (>30%), or low UMI counts. Doublets were identified and removed using DoubletFinder.

      To address inter-donor variability, we included a new supplementary figure (Supplementary Fig. 1B) showing the distribution of major immune cell types across individual donors. While we observed some variation in cell-type composition between individuals, this likely reflects natural biological heterogeneity in early HIV-1 infection. Additionally, we applied fastMNN batch correction to mitigate donor-specific technical variation. After correction, the overall patterns of gene expression within each major CD4+ T cell subset were consistent across individuals (Supplementary Fig. 1C).

      Regarding the DEG analysis, we used ‘FindMarkers’ function in Seurat (v.3.2.1), which does not exclude highly variable genes. These details have been clarified in the updated Methods section (page 18).

      The annotation of 16 cell types from PBMC samples is impressive and of good quality, however, not all cell types get attention for further analysis. It’s natural to focus primarily on the CD4 T cells according to the research objectives. The authors also study potential interactions between CD4 and CD8 T cells by cell communication inference. It would be interesting to ask additional questions for other underexplored immune cell subsets, such as: 1) Could viral RNA be detected in monocytes or macrophages during early infection? 2) What are the inferred interactions between NK cells and infected CD4 T cells, are interactions similar to CD4-CD8 results? 3) What are the inferred interactions between monocytes or macrophages and infected CD4 T cells?

      In line with our study objectives, we initially focused on CD4+ T cells as primary HIV-1 targets. However, in response to the reviewer’s comment, we examined the inferred communications between HIV-1-infected CD4+ T cells and other immune cells.

      (1) With regard to the presence of viral RNA in monocytes or macrophages, we observed negligible HIV-1 RNA signal in these cell types in our dataset, consistent with their low permissiveness in early-stage infection [2]. However, we acknowledge the limitations of detecting rare infected cells at the single-cell level.

      (2) We identified increased MIF and ICAM2 signaling between NK cells and HIV-1-infected CD4+ T cells, which are associated with KLF2-mediated immune modulation. These patterns are consistent with the CD4–CD8 interaction results observed in our dataset. (Supplementary Fig. 5A)

      (3) Through the cell-cell interaction analysis with differential expression analysis, we inferred reduced CCL5 and CD55 signaling between monocytes and HIV-1-infected CD4+ T cells (Supplementary Fig. 5B). These reductions may potentially impair immune responses and antiviral defense.

      We appreciate the reviewer’s suggestions and believe that the analysis of underexplored immune subsets strengthens the relevance of our findings. These results have been incorporated into the revised Results (page 9).

      Discussion

      It would be interesting to see more discussion of the observation of how naïve T cells produce more viral RNA compared to effector T cells. It seems counterintuitive according to general levels of transcriptional and translational activity in subsets.

      Another discussion block could be added regarding the results and conclusion comparison with Ashokkumar et al. paper published earlier in 2024 (10.1093/gpbjnl/qzae003). This earlier publication used both a cell line-based HIV infection model and primary infected CD4 T cells and identified certain transcription factors correlated with viral RNA expression.

      Thank you to Reviewer #3 for the insightful suggestions. We observed that the proportion of HIV-1-infected naïve CD4 T cells is higher compared to effector T cells. Although effector CD4 T cells are generally more active, previous studies have suggested that naïve CD4 T cells are susceptible to HIV-1 infection during early infection that may associate with initial expansion and rapid progression [3, 4]. This may be due to less restriction by antiviral signaling or more accessible chromatin states in resting cells. We have added this context and cited relevant papers to address this observation (page 11)

      In addition, we have incorporated a comparative discussion with the recent study [5], which identified FOXP1 and GATA3 as transcriptional regulators associated with HIV-1 RNA expression. While these TFs were not significantly differentially expressed in our dataset, we discuss potential reasons for this discrepancy—including differences in infection model (in vitro vs. ex vivo), infection stage (latency vs. acute), and T cell subset composition—and emphasize that both studies highlight the importance of transcriptional regulation in HIV-1 persistence (page 12 and Supplementary Fig. 4B).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study has several notable limitations.

      First, it was restricted to early-stage HIV-1 infection (<6 months) without longitudinal data, preventing the authors from capturing temporal changes in immune cell populations, gene expression profiles, and epigenetic landscapes throughout disease progression.

      Thank you to Reviewer #1 for this important limitation. As noted, our study focused exclusively on early-stage HIV-1 infection (<6 months) to capture the initial immune dysregulation and epigenetic alterations. We agree that longitudinal analysis would provide valuable insights into disease progression. However, due to the limited availability of early-infection patient samples suitable for performing multi-omics profiling, we prioritized capturing a detailed snapshot at this early stage. To address this limitation, future studies incorporating longitudinal sampling—including chronic infection and long-term non-progressors—will be essential to fully elucidate the temporal dynamics of HIV-1 pathogenesis.

      Second, while the bioinformatic analysis compared "Uninfected" and "HIV-1-infected" cells from patients, the authors could have strengthened their findings by incorporating publicly available single-cell data from healthy donors and chronically infected HIV-1 patients to validate their arguments across all figures.

      To support the robustness of our findings, we incorporated a publicly available single-cell RNA-seq dataset [1], which includes both healthy donors and individuals with chronic HIV-1 infection. In this dataset, we validated the upregulation of KLF2 and its target genes in HIV-1-infected CD4+ T cells and observed generally consistent expression patterns with those in our early-infection cohort (page 8; page 12 and Supplementary Fig. S4). While not all gene-level trends were identically reflecting differences in infection stage and immune activation status, this external comparison reinforces the reproducibility of key observations and highlights the unique transcriptional features associated with early HIV-1 infection.

      Third, although the study focused on CD4+ T cells as primary HIV-1 targets, it overlooked other important immune cells such as CD8+ T cells, monocytes, and NK cells, which may contribute to viral persistence and immune dysfunction through cell-cell interactions.

      In the revised manuscript, we expanded our analysis to include predicted ligand–receptor interactions between HIV-1-infected and uninfected CD4+ T cells with innate and cytotoxic immune cells using CellChat v.2.1.1. Specifically, we evaluated interactions with NK cells and monocytes and identified altered signaling pathways such as MIF, ICAM2, CCL5, and CLEC2B, which are associated with immune modulation (Supplementary Fig. 5A). We have added these results to the revised Results (page 9).

      Lastly, comparing these findings with other chronic viral infections (e.g., HBV, HCV) would have positioned this work more effectively within the broader field of viral immunology and enhanced its impact.

      We agree that broader comparisons with other chronic viral infections could enhance the impact of our findings. In the current discussion, we noted similarities in interferon signaling disruption with viruses such as HCV and HSV. (page 11). Our observation that HIV-1-infected CD4+ T cells exhibit impaired interferon responses is consistent with immune evasion mechanisms reported in HCV and HSV infections. These results underscore both the shared and specific features of immune modulation and persistence during HIV-1 early infection.

      Reviewer #3 (Recommendations for the authors):

      Supplementary Table S1 should indicate which technique was used for sequencing. However, the current version of the table marks no protocol applied to the majority of the samples, which is confusing and needs to be corrected.

      Thank you to Reviewer #3 for pointing out this important oversight. We have revised Supplementary Table S1 to clearly indicate the sequencing method used for each sample. Separate columns for scRNA-seq, scATAC-seq, and sc-Multiome now specify whether each technique was applied (“Yes” or “No”) to improve clarity and transparency.

      (1) Wang, S., et al., An atlas of immune cell exhaustion in HIV-infected individuals revealed by single-cell transcriptomics. Emerg Microbes Infect, 2020. 9(1): p. 2333-2347.

      (2) Arfi, V., et al., Characterization of the early steps of infection of primary blood monocytes by human immunodeficiency virus type 1. J Virol, 2008. 82(13): p. 6557-65.

      (3) Douek, D.C., et al., HIV preferentially infects HIV-specific CD4+ T cells. Nature, 2002. 417(6884): p. 95-8.

      (4) Jiao, Y., et al., Higher HIV DNA in CD4+ naive T-cells during acute HIV-1 infection in rapid progressors. Viral Immunol, 2014. 27(6): p. 316-8.

      (5) Ashokkumar, M., et al., Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation. Genomics Proteomics Bioinformatics, 2024. 22(1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.

      The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      We are pleased that the reviewers recognized the strength and significance of our findings describing a gut-to-brain cytokine signaling mechanism involving the blood-brain barrier (BBB) and its role in regulating sleep, and we thank them for their comments.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      We thank the reviewer for raising this point. Although several midgut cell types (including the absorptive enterocytes) may indeed produce Unpaired (Upd) cytokines, our study specifically focused on enteroendocrine cells (EECs), which are well-characterized as secretory endocrine cells capable of exerting systemic effects. As detailed in our response to Results point #2 (please see below), we show that EEC-specific manipulation of Upd signaling is both necessary and sufficient to regulate sleep in response to intestinal oxidative stress. These findings support the role of EECs as a primary source of gut-derived cytokine signaling to the brain. To acknowledge the possible involvement of other source, we have also added a statement to the Discussion in the revised manuscript noting that other, non-endocrine gut cell types may contribute to systemic Unpaired signaling that modulates sleep.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      We agree with the reviewer that this is an important point. To address it, we performed additional validation experiments to assess whether the voilà-GAL4 driver in combination with R57C10-GAL80 (EEC>) influences upd2 or upd3 expression in the brain. Our results show that manipulation using EEC> alters upd2 and upd3 expression in the gut (Fig. 1a,b), with new data showing that this does not affect their expression levels in neuronal tissues (Fig. S1a), supporting the specificity of our approach. These new data are now included in the revised manuscript and described in the Results section. This additional validation strengthens our conclusion that the observed sleep phenotypes result from gut-specific cytokine signaling, rather than from effects on Unpaired cytokines produced in the brain.

      (1) >(3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      We thank the reviewer for raising this point. We agree that the references originally cited did not sufficiently justify the use of H<sub>2</sub>O<sub>2</sub> feeding as a model of gut inflammation. To address this, we have revised the Results section to clarify that we use H<sub>2</sub>O<sub>2</sub> feeding as a controlled method to elevate intestinal ROS levels, rather than as a general model of inflammation. This approach allows us to investigate the specific effects of ROS-induced cytokine signaling in the gut. We have also added additional citations to support the physiological relevance of this model. For instance, Tamamouna et al. (2021) demonstrated that H<sub>2</sub>O<sub>2</sub> feeding induces intestinal stem-cell proliferation – a response also observed during bacterial infection – and Jiang et al. (2009) showed that enteric infections increase upd2 and upd3 expression, which we similarly observe following H<sub>2</sub>O<sub>2</sub> feeding (Fig. 3a). These findings support the use of H<sub>2</sub>O<sub>2</sub> as a tool to mimic specific ROS-linked responses in the gut. We believe this targeted and tractable model is a strength of our study, enabling us to dissect how intestinal ROS modulates systemic physiology through cytokine signaling

      Additionally, we have included a statement in the Discussion acknowledging that ROS generated during infection may activate signaling mechanisms distinct from those triggered by chemically induced oxidative stress, and that exploring these differences in future studies may yield important insights into gut–brain communication. These revisions provide a stronger justification for our model while more accurately conveying both its relevance and its limitations.

      (2) >(4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-02341540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      We thank the reviewer for this thoughtful follow-up point. We would like to clarify that we do not claim that the effects observed in our study directly reflect the full response to enteric infection. As outlined in our revised response to comment 3, we have updated the manuscript to more precisely describe the H<sub>2</sub>O<sub>2</sub>-feeding paradigm as a model that induces local intestinal ROS responses comparable to, but not equivalent to, those observed during pathogenic challenges. This revised framing highlights both the potential similarities and differences between chemically induced oxidative stress and infection-induced responses. Indeed, in the revised Discussion, we now explicitly acknowledge that ROS generated during infection may engage distinct signaling mechanisms compared to exogenous H<sub>2</sub>O<sub>2</sub> and emphasize the value of future studies in delineating these pathways. We are currently pursuing this direction in an independent ongoing study investigating the effects of enteric infections. However, for the present work, we chose to focus on the effects of ROS-induced responses in isolation, as this provides a clean and well-controlled context to dissect the specific contribution of oxidative stress to cytokine signaling and sleep regulation.

      To further address the reviewer’s concern, we have also included new data (a TUNEL stain for apoptotic DNA fragmentation) in the revised manuscript showing that H<sub>2</sub>O<sub>2</sub> feeding does not damage neuronal tissues under our experimental conditions (Fig. S3f,g). This addresses the point raised regarding the potential neurotoxicity of H<sub>2</sub>O<sub>2</sub>, as described by Majcin Dorcikova et al. (2023), and supports the specificity of the sleep phenotypes observed in our study. We believe these revisions and clarifications strengthen the manuscript and make our interpretation more precise.

      (3) >(5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

      Our work highlights a distinct role for gut-derived AstA in sleep regulation compared to findings by Lin et al. (Cell Discovery, 2023)[1], who showed that gut AstA mediates energy wasting during sleep deprivation. Their study focused on the metabolic consequences of sleep loss, proposing that sleep deprivation increases ROS in the gut, which then promotes the release of the glucagon-like hormone adipokinetic hormone (AKH) through gut AstA signaling, thereby triggering energy expenditure.

      In contrast, our study addresses the inverse question – how ROS in the gut influences sleep. In our model, intestinal ROS promotes sleep, raising the intriguing possibility – cleverly pointed out by the reviewers – that ROS generated during sleep deprivation might promote sleep by inducing Unpaired cytokine signaling in the gut. According to our findings, this suppresses wake-promoting AstA signaling in the BBB, providing a mechanism to promote sleep as a restorative response to gut-derived oxidative stress and potentially limiting further ROS accumulation. Importantly, our findings support a wakepromoting role for EEC-derived AstA, demonstrated by several lines of evidence. First, EEC-specific knockdown of AstA increases sleep. Second, activation of AstA<sup>+</sup> EECs using the heat-sensitive cation channel Transient Receptor Potential A1 (TrpA1) reduces sleep, and this effect is abolished by simultaneous knockdown of AstA, indicating that the sleep-suppressing effect is mediated by AstA and not by other peptides or secreted factors released by these cells. Third, downregulation of AstA receptor expression in BBB glial cells increases sleep, further supporting the existence of a functional gut AstA– glia arousal pathway. We have now included new data in the revised manuscript showing that AstA release from EECs is downregulated during intestinal oxidative stress (Fig. 7k,l,m). This suggests that this wake-promoting signal is suppressed both at its source (the gut endocrine cells), by unknown means, and at its target, the BBB, via Unpaired cytokine signaling that downregulates AstA receptor expression. This coordinated downregulation may serve to efficiently silence this arousal-promoting pathway and facilitate sleep during intestinal stress. These new data, along with an expanded discussion, provide further mechanistic insight into gut-derived AstA signaling and strengthen our proposed model.

      This contrasts with the interpretation by Lin et al., who observed increased AstA peptide levels in EECs after antioxidant treatment and interpreted this as peptide retention. However, peptide accumulation may result from either increased production or decreased release, and peptide levels alone are insufficient to distinguish between these possibilities. To resolve this, we examined AstA transcript levels, which can serve as a proxy for production. Following oxidative stress (24 h of 1% H<sub>2</sub>O<sub>2</sub> feeding and the following day), when animals show increased sleep (Fig. 7e), we observed a decrease in AstA transcript levels followed by an increase in peptide levels (Fig. 7k,l,m), suggesting that oxidative stress leads to reduced gut AstA production and release. Furthermore, we recently found that a class of EECs that produce the hormone Tachykinin (Tk) and are distinct from the AstA<sup>+</sup> EECs express the ROSsensitive cation channel TrpA1 (Ahrentløv et al., 2025, Nature Metabolism2). In these Tk<sup>+</sup> EECs, TrpA1 mediates ROS-induced Tk hormone release. In contrast, single-cell RNA-seq data[3] do not support TrpA1 expression in AstA<sup>+</sup> EECs, consistent with our findings that ROS does not promote AstA release – an effect that would be expected if TrpA1 were functionally expressed in AstA<sup>+</sup> EECs. This contradicts the findings of Lin et al., who reported TrpA1 expression in AstA<sup>+</sup> EECs. We have now included relevant single-cell data in the revised manuscript (Fig. S6f) showing that TrpA1 is specifically expressed in Tk<sup>+</sup> EECs, but not in AstA<sup>+</sup> EECs, and we have expanded the discussion to address discrepancies in TrpA1 expression and AstA regulation.

      Taken together, our results reveal a dual-site regulatory mechanism in which Unpaired cytokines released from the gut act at the BBB to downregulate AstA receptor expression, while AstA release from EECs is simultaneously suppressed. We thank the reviewers for raising this important point. We have also included a discussion the other point raised by the reviewers – the possibility that ROS generated during sleep deprivation may engage the same signaling pathways described here, providing a mechanistic link between sleep deprivation, intestinal stress, and sleep regulation.

      Recommendations for the authors:

      A- Material and Methods:

      (1) Feeding Assay: The cited publication (doi.org:10.1371/journal.pone.0006063) states: "For the amount of label in the fly to reflect feeding, measurements must therefore be confined to the time period before label egestion commences, about 40 minutes in Drosophila, a time period during which disturbance of the flies affects their feeding behavior. There is thus a requirement for a method of measuring feeding in undisturbed conditions." Was blue fecal matter already present on the tube when flies were homogenized at 1 hour? If so, the assay may reflect gut capacity rather than food passage (as a proxy for food intake). In addition, was the variability of food intake among flies in the same tube tested (to make sure that 1-2 flies are a good proxy for the whole population)?

      We agree that this is an important point for feeding experiments. We are aware of the methodological considerations highlighted in the cited study and have extensive experience using a range of feeding assays in Drosophila, including both short- and long-term consumption assays (e.g., dye-based and CAFE assays), as well as automated platforms such as FLIC and FlyPAD (Nature Communications, 2022; Nature Metabolism, 2022; and Nature Metabolism, 2025)[2,4,5].

      For the dye-based assay, we carefully selected a 1-hour feeding window based on prior optimization. Since animals were not starved prior to the assay, shorter time points (e.g., 30 minutes) typically result in insufficient ingestion for reliable quantification. A 1-hour period provides a robust readout while remaining within the timeframe before significant label excretion occurs under our experimental conditions. To support the robustness of our findings, we complemented the dye-based assay with data from FLIC, which enables automated, high-resolution monitoring of feeding behavior in undisturbed animals over extended periods. The FLIC results were consistent with the dye-based data, strengthening our confidence in the conclusions. To minimize variability and ensure consistency across experiments, all feeding assays were performed at the same circadian time – Zeitgeber Time 0 (ZT0), corresponding to 10:00 AM when lights are turned on in our incubators. This time point coincides with the animals' natural morning feeding peak, allowing for reproducible comparisons across conditions. Regarding variability among flies within tubes, each biological replicate in the dye assay consisted of 1–2 flies, and results were averaged across multiple replicates. We observed good consistency across samples, suggesting that these small groups reliably reflect group-level feeding behavior under our conditions.

      (2) Biological replicates: whereas the number of samples is clearly reported in each figure, the number of biological replicates is not indicated. Please include this information either in Material and methods or in the relevant figure legends. Please also include a description of what was considered a biological replicate.

      We have now clarified in the Materials and Methods section under Statistics that all replicates represent independent biological samples, as suggested by the reviewers.

      (3) Control Lines: please indicate which control lines were used instead of citing another publication. If preferred, this information could be supplied as a supplementary table.

      We now provide a clear description of the control lines used in the Materials and Methods section. Specifically, all GAL4 and GAL80 lines used in this study were backcrossed for several generations into a shared w<sup>1118</sup> background and then crossed to the same w<sup>1118</sup> strain used as the genetic background for the UAS-RNAi, <i.CRISPR, or overexpression lines. This approach ensures, to a strong approximation, that the only difference between control and experimental animals is the presence or absence of the UAS transgene.

      (4) Statistical analyses: for some results (e.g., those shown in Figure 3d), it could be useful to test the interaction between genotype and treatment.

      We thank the reviewer for this helpful suggestion. In response, we have now performed two-way ANOVA analyses to assess genotype × treatment (diet) interaction effects for the relevant data, including those shown in Figure 3d as well as additional panels where animals were exposed to oxidative stress and sleep phenotypes were measured. We have added the corresponding interaction p-values in the updated figure legends for Figures 3d, 3k, 5a–c, 5f, 5h, 5i, 6c, 6e, and 7e. All of these tests revealed significant interaction effects, supporting the conclusion that the observed differences in sleep phenotypes are specifically dependent on the interaction between genetic manipulation (e.g., cytokine or receptor knockdown) and oxidative stress. These additions reinforce the interpretation that Unpaired cytokine signaling, glial JAK-STAT pathway activity, and AstA receptor regulation functionally interact with intestinal ROS exposure to modulate sleep. We thank the reviewer for suggesting this improvement.

      (5) Reporting of p values. Some are reported as specific values whereas others are reported as less than a specific value. Please make this reporting consistent across different figures.

      All p-values reported in the manuscript are exact, except in cases where values fall below p < 0.0001. In those instances, we use the inequality because the Prism software package (GraphPad, version 10), which was used for all statistical analyses, does not report more precise values. We believe this reporting approach reflects standard practice in the field.

      (6) Please include the color code used in each figure, either in the figure itself or in the legend.

      We have now clarified the color coding in all relevant figures. In particular, we acknowledge that the meaning of the half-colored circles used to indicate H<sub>2</sub>O<sub>2</sub> treatment was not previously explained. These have now been clearly labeled in each figure to indicate treatment conditions.

      (7) The scheme describing the experimental conditions and the associated chart is confusing. Please improve.

      We have improved the schematic by replacing “ROS” with “H<sub>2</sub>O<sub>2</sub>” to more clearly indicate the experimental condition used. Additionally, we have added the corresponding circle annotations so that they now also appear consistently above the relevant charts. This revised layout enhances clarity and helps readers more easily interpret the experimental conditions. We believe these changes address the reviewer’s concern and make the figure significantly more intuitive.

      8) Please indicate which line was used for upd-Gal4 and the evidence that it faithfully reflects upd3 expression.

      We have now clarified in the Materials and Methods section that the upd3-GAL4 line used in our study is Bloomington stock #98420, which drives GAL4 expression under the control of approximately 2 kb of sequence upstream of the upd3 start codon. This line has previously been used as a transcriptional reporter for upd3 activity. The only use of this line was to illustrate reporter expression in the EECs. To support this aspect of Upd3 expression, we now include new data in the revised manuscript using fluorescent in situ hybridization (FISH) against upd3, which confirms the presence of upd3 transcripts in prospero-positive EECs of the adult midgut (Fig. S1b). Additionally, we show that upd3 transcript levels are significantly reduced in dissected midguts following EEC-specific knockdown using multiple independent RNAi lines driven by voilà-GAL4, both alone and in combination with R57C10-GAL80, consistent with endogenous expression in these cells (Fig. 1a,b).

      To further address the reviewer’s concern and provide additional support for the endogenous expression of upd3 in EECs, we performed targeted knockdown experiments focusing on molecularly defined EEC subpopulations. The adult Drosophila midgut contains two major EEC subtypes characterized by their expression of Allatostatin C (AstC) or Tachykinin (Tk), which together encompass the vast majority of EECs. To selectively manipulate these populations, we used AstC-GAL4 and Tk-GAL4 drivers – both knock-in lines in which GAL4 is inserted at the respective endogenous hormone loci. This design enables precise GAL4 expression in AstC- or Tk-expressing EECs based on their native transcriptional profile. To eliminate confounding neuronal expression, we combined these drivers with R57C10GAL80, restricting GAL4 activity to the gut and generating AstC<sup>Gut</sup>> and Tk<sup>Gut</sup>> drivers. Using these tools, we knocked down upd2 and upd3 selectively in the AstC- or Tk-positive EECs. Knockdown of either cytokine in AstC-positive EECs significantly increased sleep under homeostatic conditions, recapitulating the phenotype observed with knockdown in all EECs (Fig. 1m-o). In contrast, knockdown of upd2 or upd3 in Tk-positive EECs had no effect on sleep (Fig. 1p-r). Furthermore, we show in the revised manuscript that selective knockdown of upd2 or upd3 in AstC-positive EECs abolishes the H<sub>2</sub>O<sub>2</sub>-induced increase in sleep (Fig. 3f–h). These findings demonstrate that Unpaired cytokine signaling from AstC-positive EECs is essential for mediating the sleep response to intestinal oxidative stress, highlighting this specific EEC subtype as a key source of cytokine-driven regulation in this context. These new results indicate that AstC-positive EECs are a primary source of the Unpaired cytokines that regulate sleep, while Tk-positive EECs do not appear to contribute to this function. Importantly, upd3 transcript levels were significantly reduced in dissected midguts following AstC<sup>Gut</sup> driven knockdown (Fig. S1r), further confirming that upd3 is endogenously expressed in AstC-positive EECs. Thus we have bolstered our confidence that upd3 is indeed expressed in EECs, as illustrated by the reporter line, through several means.

      (9) Please indicate which GFP line was used with upd-Gal4 (CD8, NLS, un-tagged, etc). The Material and Methods section states that it was "UAS-mCD8::GFP (#5137);", however, the stain does not seem to match a cell membrane pattern but rather a nuclear or cytoplasmic pattern. This information would help the interpretation of Figure 1C.

      We confirm that the GFP reporter line used with upd3-GAL4 was obtained from Bloomington stock #98420. As noted by the Bloomington Drosophila Stock Center, “the identity of the UAS-GFP transgene is a guess,” and the subcellular localization of the GFP fusion is therefore uncertain. We agree with the reviewer that the signal observed in Figure 1c does not display clear membrane localization and instead appears diffuse, consistent with cytoplasmic or partially nuclear localization. In any case, what we find most salient is the reporter’s labeling of Prospero-positive EECs in the adult midgut, consistent with upd3 expression in these cells. This conclusion is further supported by multiple lines of evidence presented in the revised manuscript, as mentioned above in response to question #8: (1) fluorescent in situ hybridization (FISH) for upd3 confirms expression in EECs (Fig. S1b), (2) EEC-specific RNAi knockdown of upd3 reduces transcript levels in dissected midguts, and (3) publicly available single-cell RNA sequencing datasets[3] also indicate that upd3 is expressed at low levels in a subset of adult midgut EECs under normal conditions. We have also clarified in the revised Materials and Methods section that GFP localization is undefined in the upd3-GAL4 line, to guide interpretation of the reporter signal.

      B- Results

      (1) Figure 1: According to previous work (10.1016/j.celrep.2015.06.009, http://flygutseq.buchonlab.com/data?gene=upd3%0D%0A), in basal conditions upd3 is expressed as following: ISC (35 RPKM), EB (98 RPKM), EC (57 RPKM), and EEC (8 RPKM). Accordingly, even complete KO in EECs should eliminate only a small fraction of upd3 from whole guts, even less considering the greater abundance of other cell types such as ECs compared to EECs. It would be useful to understand where this discrepancy comes from, in case it is affecting the conclusion of the manuscript. While this point per se does not affect the main conclusions of the manuscript, it makes the interpretation of the results more difficult.

      We acknowledge the previously reported low expression of upd3 in EECs. However, the FlyGut-seq site appears to be no longer available, so we could not directly compare other related genes. Nonetheless, our data – based on in situ hybridization, reporter expression, and multiple RNAi knockdowns – consistently support upd3 expression in EECs. These complementary approaches strengthen the conclusion that EECs are an important source of systemic upd3 under the conditions tested.

      (2) Figure 1: The upd2-3 mutants show sleep defects very similar to those of EEC>RNAi and >Cas9. It would thus be helpful to try to KO upd3 with other midgut drivers (An EC driver like Myo1A or 5966GS and a progenitor driver like Esg or 5961GS) to validate these results. Such experiments might identify precisely which cells are involved in the gut-brain signaling reported here.

      We appreciate the reviewer’s suggestion and agree that exploring other potential sources of Upd3 in the gut is an interesting direction. In this study, we have focused on EECs, which are the primary hormone-secreting cells in the intestine and thus the most likely candidates for mediating systemic effects such as gut-to-brain signaling. While it is possible that other gut cell types – such as enterocytes (e.g., Myo1A<sup>+</sup>) or intestinal progenitors (e.g., Esg<sup>+</sup>) – also contribute to Upd3 production, these cells are not typically endocrine in nature. Demonstrating their involvement in gutto-brain communication would therefore require additional, extensive validation beyond the scope of the current study. Importantly, our data show that manipulating Upd3 specifically in EECs is both necessary and sufficient to modulate sleep in response to intestinal ROS, strongly supporting the conclusion that EEC-derived cytokine signaling underlies the observed phenotype. In contrast, manipulating cytokines in other gut cells could produce indirect effects – such as altered proliferation, epithelial integrity, or immune responses – that complicate the interpretation of behavioral outcomes like sleep. For these reasons, we chose to focus on EECs as the source of endocrine signals mediating gut-to-brain communication. However, to address this point raised by the reviewer, we have now included a statement in the Discussion acknowledging that other non-endocrine gut cell types may also contribute to the systemic Unpaired signaling that modulates sleep in response to intestinal oxidative stress.

      (3) Figure 3: "This effect mirrored the upregulation observed with EEC-specific overexpression of upd3, indicating that it reflects physiologically relevant production of upd3 by the gut in response to oxidative stress." Please add (Figure 3a) at the end of this sentence.

      We have now added “(Figure 3a)” at the end of the sentence to clearly reference the relevant data.

      (4) For Figure 3b, do you have data showing that the increased amount of sleep was due to the addition of H2O2 per se, rather than the procedure of adding it?

      We have added new data to address this point. To ensure that the observed sleep increase was specifically due to the presence of H<sub>2</sub>O<sub>2</sub> and not an effect of the food replacement procedure, we performed a control experiment in which animals were fed standard food prepared using the same protocol and replaced daily, but without H<sub>2</sub>O<sub>2</sub>. These animals did not exhibit increased sleep, confirming that the sleep effect is attributable to intestinal ROS rather than the supplementation procedure itself (Fig. S3a). Thanks for the suggestion.

      (5) In the text it is stated that "Since 1% H2O2 feeding induced robust responses both in upd3 expression and in sleep behavior, we asked whether gut-derived Unpaired signaling might be essential for the observed ROS-induced sleep modulation. Indeed, EEC-specific RNAi targeting upd2 or upd3 abolished the sleep response to 1% H2O2 feeding." While it is indeed true that there is no additional increase in sleep time due to EEC>upd3 RNAi, it is also true that EEC>upd3 RNAi flies, without any treatment, have already increased their sleep in the first place. It is then possible that rather than unpaired signaling being essential, an upper threshold for maximum sleep allowed by manipulation of these processes was reached. It would be useful to discuss this point.

      Several findings argue against a ceiling effect and instead support a requirement for Unpaired signaling in mediating ROS-induced sleep. Animals with EEC-specific upd2 or upd3 knockdown or null mutation not only fail to increase sleep following H<sub>2</sub>O<sub>2</sub> treatment but actually exhibit reduced sleep during oxidative stress (Fig. 3e, k, l; Fig. 5e, f), suggesting that Unpaired signaling is required to sustain sleep under these conditions. Similarly, animals with glial dome knockdown also show reduced sleep under oxidative stress, closely mirroring the phenotype of EEC-specific upd3 RNAi animals (Fig. 5a–c, g–i). These results support the conclusion that gut-to-glia Unpaired cytokine signaling is necessary for maintaining elevated sleep during oxidative stress. In the absence of this signaling, animals exhibit increased wakefulness. We identify AstA as one such wake-promoting signal that is suppressed during intestinal stress. We present new data showing that this pathway is downregulated not only via Unpaired-JAK/STAT signaling in glial cells but also through reduced AstA release from the gut in the revised manuscript. This model, in which Unpaired cytokines promote sleep during intestinal stress by suppressing arousal pathways, is discussed throughout the manuscript to address the reviewer’s point.

      (6) In Figure 3k, the dots highlighting the experiment show an empty profile, a full one, and a half one. Please define what the half dots represent.

      We have now clarified the color coding in all relevant figures. Specifically, we acknowledge that the meaning of the half-colored circles indicating H<sub>2</sub>O<sub>2</sub> treatment was not previously defined – it indicates washout or recovery time. In the revised version, these symbols are now clearly labeled in each figure to indicate the treatment condition, ensuring consistent and intuitive interpretation across all panels.

      (7) The authors used appropriate GAL4 and RNAi lines to the knockdown dome, a upd2/3 JAK-STATlinked receptor, specifically in neurons and glia, respectively, in order to identify the CNS targets of upd2/3 cytokines produced by enteroendocrine cells (EECs). Pan-neuronal dome knockdown did not alter daytime sleep in adult females, yet pan-glial dome knockdown phenocopied effects of upd2/3 knockdown in EECs. They also observed that EEC-specific knockdown of upd2 and upd3 led to a decrease in JAK-STAT reporter activity in repo-positive glial cells. This supports the authors' conclusion that glial cells, not neurons, are the targets by which unpaired cytokines regulate sleep via JAK-STAT signaling. However, they do not show nighttime sleep data of pan-neuronal and pan-glial dome knockdowns. It would strengthen their conclusion if the nighttime sleep of pan-glial dome knockdown phenocopied the upd2/3 knockdowns as well, provided the pan-neuronal dome knockdown did not alter nighttime sleep.

      We have now added nighttime sleep data for both pan-glial and pan-neuronal domeless knockdowns in the revised manuscript (Fig. 2a). Glial knockdown increased nighttime sleep, similar to EEC-specific upd2/3 knockdown, while neuronal knockdown had no effect. These results further support the glial cells’ being the relevant target of gut-derived Unpaired signaling.

      (8) The authors only used one method to induce oxidative stress (hydrogen peroxide feeding). It would strengthen their argument to test multiple methods of inducing oxidative stress, such as lipopolysaccharide (LPS) feeding. In addition, it would be useful to use a direct bacterial infection to confirm that in flies, the infection promotes sleep. Additionally, flies deficient in Dome in the BBB and infected should not be affected in their sleep by the infection. These experiments would provide direct support for the mechanism proposed. Finally, the authors should add a primary reference for using ROS as a model of bacterial infection and justify their choice better.

      We agree that directly comparing different models of intestinal stress, such as bacterial infection or LPS feeding, would provide valuable insight into how gut-derived signals influence sleep in response to infection. As noted in our detailed responses above, we now include an expanded rationale for our use of H<sub>2</sub>O<sub>2</sub> feeding as a controlled and well-established method for inducing intestinal ROS – one of the key physiological responses to enteric infection and inflammation. In the revised Discussion, we explicitly acknowledge that pathogenic infections – which trigger both intestinal ROS and additional immune pathways – may engage distinct or complementary mechanisms compared to chemically induced oxidative stress. We emphasize the importance of future studies aimed at dissecting these differences. In fact, we are actively pursuing this direction in ongoing work examining sleep responses to enteric infection. For the purposes of the present study, however, we chose to focus on a tractable and specific model of ROS-induced stress to define the contribution of Unpaired cytokine signaling to gut-brain communication and sleep regulation. This approach allowed us to isolate the effect of oxidative stress from other confounding immune stimuli and identify a glia-mediated signaling mechanism linking gut epithelial stress to changes in sleep behavior.

      (9) To confirm that animals lacking EEC Unpaired signaling are not more susceptible to ROS-induced damage, the authors assessed the survival of upd2 and upd3 knockdowns on 1% H2O2 and concluded they display no additional sensitivity to oxidative stress compared to controls. It may be useful to include other tests of sensitivity to oxidative stress, in addition to survival.

      We appreciate the reviewer’s suggestion. In our view, survival is a highly informative and stringent readout, as it reflects the overall physiological capacity of the animal to withstand oxidative stress. Importantly, our data show that animals lacking EEC-derived Unpaired signaling do not exhibit reduced survival following H<sub>2</sub>O<sub>2</sub> exposure, indicating that their oxidative stress resistance is not compromised. Furthermore, we previously confirmed that feeding behavior is unaffected in these animals, suggesting that their ability to ingest food (and thus the stressor) is not impaired. As a molecular complement to these assays in response to this point and others, we have also performed an assessment of neuronal apoptosis (a TUNEL assay, Fig. S3f,g). This assay did not identify an increase in cell death in the brains of animals fed peroxide-containing medium. Thus, gross neurological health, behavior, and overall survival appear to be resilient to the environmental treatment regime we apply here, suggesting that the outcomes we observe arise from signaling per se.

      (10) The authors confirmed that animals lacking EEC-derived upd3 displayed sleep suppression similar to controls in response to starvation. These results led the authors to conclude that there is a specific requirement for EEC-derived Unpaired signaling in responding to intestinal oxidative stress. However, they previously showed that EEC-specific knockdown of upd3 and upd2 led to increased daytime sleep under normal feeding conditions. Their interpretations of their data are inconsistent.

      We appreciate the reviewer’s comment. While animals lacking EEC-derived Unpaired signaling show increased baseline sleep under normal feeding conditions, they still exhibit a robust reduction in sleep when subjected to starvation – comparable to that of control animals (Fig. S3h–j). This demonstrates that they retain the capacity to appropriately modulate sleep in response to metabolic stress. Thus, the sleep-promoting phenotype under normal conditions does not reflect a generalized inability to adjust sleep behavior. Rather, it highlights a specific role for Unpaired signaling in mediating sleep responses to intestinal oxidative stress, not in broadly regulating all sleep-modulating stimuli.

      (11) The authors report a significant increase in JAK-STAT activity in surface glial cells at ZT0 in animals fed 1% H2O2-containing food for 20 hours. This response was abolished in animals with EECspecific knockdown of upd2 or upd3. The authors confirmed there were no unintended neuronal effects on upd2 or upd3 expression in the heads. They also observed an upregulation of dome transcript levels in the heads of animals with EEC-specific knockdown of upd3 fed 1% H2O2-containing food for 15 hours, which they interpret to be a compensatory mechanism in response to low levels of the ligand. This assay is inconsistent with previous experiments in which animals were fed hydrogen peroxide for 20 hours.

      We thank the reviewer for identifying this discrepancy. The inconsistency arose from a labeling error in the manuscript. Both the JAK-STAT reporter assays in glial cells and the dome expression measurements were performed following 15 hours of H<sub>2</sub>O<sub>2</sub> feeding, not 20 hours as previously stated. We have now corrected this in the revised manuscript.

      (12) The authors show that animals with glia-specific dome knockdown did not have decreased survival on H2O2-containing food, and displayed normal rebound sleep in the morning following sleep deprivation. These results potentially undermine the significance of the paper. If the normal sleep response to oxidative stress is an important protective mechanism, why would oxidative stress not decrease survival in dome knockdown flies (that don't have the normal sleep response to oxidative stress)? This suggests that the proposed mechanism is not important for survival. The authors conclude that Dome-mediated JAK-STAT signaling in the glial cells specifically regulates ROS-induced sleep responses, which their results support.

      We agree that our survival data show that glial dome knockdown does not reduce survival under continuous oxidative stress. However, we believe this does not undermine the importance of the sleep response as an adaptive mechanism. In our survival assay, animals were continuously exposed to 1% H<sub>2</sub>O<sub>2</sub> without the opportunity to recover. In contrast, under natural conditions, oxidative stress is likely to be intermittent, and the ability to mount a sleep response may be particularly important for promoting recovery and maintaining homeostasis during or after transient stress episodes. Thus, while the JAK-STAT-mediated sleep response may not directly enhance survival under constant oxidative challenge, it likely plays a critical role in adaptive recovery under natural conditions.

      (13) Altogether, the authors conclude that enteric oxidative stress induces the release of Unpaired cytokines which activate the JAK-STAT pathway in subperineurial glia of the BBB, which leads to the glial downregulation of receptors for AstA, which is a wake-promoting factor also released by EECs. This mechanism is supported by their results, however, this research raises some intriguing questions, such as the role of upd2 versus upd3, the role of AstA-R1 versus AstA-R2, the importance of this mechanism in terms of survival, the sex-specific nature of this mechanism, and the role that nutritional availability plays in the dual functionality of Unpaired cytokine signaling in regards to sleep.

      We thank the reviewer for highlighting these important questions. Our data suggest that Upd2 and Upd3, while often considered partially redundant, both contribute to sleep regulation, with stronger effects observed for Upd3. This is consistent with prior studies indicating overlapping but non-identical roles for these cytokines. Similarly, although AstA-R1 and AstA-R2 can both be activated by AstA, knockdown of AstA-R2 consistently produces more robust sleep phenotypes, suggesting a predominant role in mediating this effect. The possibility of sex-specific regulation is indeed compelling. While our study focused on females, many gut hormones show sex-dependent activity, and we recognize this as an important avenue for future research. Finally, we have included new data in the revised manuscript showing that gut-derived AstA is downregulated under oxidative stress, further supporting our model in which Unpaired signaling suppresses arousal pathways during intestinal stress

      (14)Data Availability: It is indicated that: "Reasonable data requests will be fulfilled by the lead author". However, eLife's guidelines for data sharing require that all data associated with an article to be made freely and widely available.

      We thank the reviewer for pointing this out. We have revised the Data Availability section of the manuscript to clarify that all data will be made freely available from the lead contact without restriction, in accordance with eLife’s open data policy.

      References

      (1) Li, Y., Zhou, X., Cheng, C., Ding, G., Zhao, P., Tan, K., Chen, L., Perrimon, N., Veenstra, J.A., Zhang, L., and Song, W. (2023). Gut AstA mediates sleep deprivaPon-induced energy wasPng in Drosophila. Cell Discov 9, 49. 10.1038/s41421-023-00541-3. (2) Ahrentlov, N., Kubrak, O., Lassen, M., Malita, A., Koyama, T., Frederiksen, A.S., Sigvardsen, C.M., John, A., Madsen, P., Halberg, K.A., et al. (2025). Protein-responsive gut hormone Tachykinin directs food choice and impacts lifespan. Nature Metabolism. 10.1038/s42255-025-01267-0.

      (3) Li, H., Janssens, J., De Waegeneer, M., Kolluru, S.S., Davie, K., Gardeux, V., Saelens, W., David, F.P.A., Brbic, M., Spanier, K., et al. (2022). Fly Cell Atlas: A single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432. 10.1126/science.abk2432.

      (4) Kubrak, O., Koyama, T., Ahrentlov, N., Jensen, L., Malita, A., Naseem, M.T., Lassen, M., Nagy, S., Texada, M.J., Halberg, K.V., and Rewitz, K. (2022). The gut hormone AllatostaPn C/SomatostaPn regulates food intake and metabolic homeostasis under nutrient stress. Nature communicaPons 13, 692. 10.1038/s41467-022-28268-x.

      (5) Malita, A., Kubrak, O., Koyama, T., Ahrentlov, N., Texada, M.J., Nagy, S., Halberg, K.V., and Rewitz, K. (2022). A gut-derived hormone suppresses sugar appePte and regulates food choice in Drosophila. Nature Metabolism 4, 1532-1550. 10.1038/s42255-022-00672-z.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)>

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      We thank the reviewer for this very positive evaluation of our work and greatly appreciate their helpful comments and suggestions for improving the manuscript. We agree with the comment that the term “neuroestrogens” is misleading. Therefore, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript, including the title.

      In the following sections, “neuroestrogens” has been revised to align with the surrounding context.

      Line 21: “in the brain, also known as neuroestrogens,” → “in the brain.”

      Line 28: “neuroestrogens” → “these estrogens.”

      Line 30: “mechanism of action of neuroestrogens” → “mode of action of brain-derived estrogens.”

      Line 43: “brain-derived estrogens, also called neuroestrogens,” → “estrogens.”

      Line 74: “neuroestrogen synthesis is selectively impaired while gonadal estrogen synthesis remains intact” → “estrogen synthesis in the brain is selectively impaired while that in the gonads remains intact.”

      Line 77: “neuroestrogens” → “these estrogens.”

      Line 335: “levels of neuroestrogens” → “brain estrogen levels.”

      Line 338: “neuroestrogens” → “these estrogens.”

      Line 351: “neuroestrogens” → “these estrogens.”

      Line 357: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 359: “neuroestrogens” → “estrogen synthesis in the brain.”

      Line 390: “active synthesis of neuroestrogens” → “active estrogen synthesis in the brain.”

      Line 431: “neuroestrogens” → “estrogens in the brain.”

      Line 431: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 433: “neuroestrogen action” → “their action.”

      Strengths:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones.

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation.

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'.

      Includes multiple follow-up experiments, which lead to tests of internal replication and an impactful mechanistic proposal.

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionarily ancient.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) As stated in the summary, the authors attribute the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either.

      As noted in Response to reviewer #1’s summary comment, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” Following this addition, “This observation suggests” in the subsequent sentence has been replaced with “These observations suggest.”

      The following references (#18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (2) The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering.

      Line 282: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wildtype siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P = 0.1512) (Fig. 5L; Fig. S8E), showing less aggression.”

      (3) Lack of attribution of previously published work from other research groups that would provide the proper context of the present study.

      In response to this and other comments from this reviewer, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in maletypical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction. This addition provides an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list, with other references renumbered accordingly:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      (4) There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation".

      Line 68: As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      The following revisions have also been made to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (5) The experimental design for studying aggression in males has flaws. A standard test like a resident intruder test should be used.

      We agree that the resident-intruder test is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (6) While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside.

      We agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021, Curr Biol, 1699–1710). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (7) The statistics comparing "experimental to experimental" and "control to experimental" aren't appropriate.

      The reviewer raises concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.

      Line 619: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and Methods. We have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The novelty of this study stems from the observations that neuro-estrogens appear to interact with brain androgen receptors to support male-typical behaviors. The study provides a step forward in clarifying the somewhat contradictory findings that, in teleosts and unlike other vertebrates, androgens regulate male-typical behaviors without requiring aromatization, but at the same time estrogens appear to also be involved in regulating male-typical behaviors. They manipulate the expression of one aromatase isoform, cyp19a1b, that is purported to be brain-specific in teleosts. Their findings are important in that brain estrogen content is sensitive to the brain-specific cyp19a1b deficiency, leading to alterations in both sexual behavior and aggressive behavior. Interestingly, these males have relatively intact fertility rates, despite the effects on the brain.

      We thank this reviewer for their positive evaluation of our work and constructive comments, which we found very helpful in improving the manuscript.

      That said, the framing of the study, the relevant context, and several aspects of the methods and results raise concerns. Two interpretations need to be addressed/tempered:

      (1) that the rescue of cyp19a1b deficiency by tank-applied estradiol is not necessarily a brain/neuroestrogen mode of action, and

      Line 155: cyp19a1b-deficient males exhibited a severe reduction in brain E2 levels, yet their peripheral E2 levels remained comparable to those in wild-type males. Given this hormonal milieu and the lack of behavioral change in wild-type males following E2 treatment, the observed recovery of mating behavior in cyp19a1b-deficient males following E2 treatment can be best explained by the restoration of brain E2 levels. However, as the reviewer pointed out, we cannot rule out the possibility that bath-immersed E2 influenced behavior through an indirect peripheral mechanism. To address this concern, we have modified the text as follows: “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (2) the large increases in peripheral and brain androgen levels in the cyp19a1b deficient animals imply some indirect/compensatory effects of lifelong cyp19a1b deficiency.

      As stated in line 151, androgen/AR signaling has a strong facilitative effect on male-typical behaviors in teleosts. If increased androgen levels in the periphery and brain affected behavior, the expected effect would be facilitative. However, cyp19a1b-deficient males exhibited impaired male-typical behaviors, suggesting that elevated androgen levels were unlikely to be responsible. Although chronic androgen elevation could cause androgen receptor desensitization, which could lead to behavioral suppression, our long-term androgen treatments have consistently promoted, rather than inhibited, male-typical behaviors (e.g., Nishiike et al., Proc Natl Acad Sci USA 121:e2316459121). Hence, this possibility is also highly unlikely.

      Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of neuro-estrogens in the control of sexual and aggressive behavior in teleost fish. The constitutive deletion of Cyp19a1b reduced brain estrogen content by 87% in males and about 50% in females. It led to reduced sexual and aggressive behavior in males and reduced sexual behavior in females. These effects are reversed by adult treatment with estradiol thus indicating that they are activational in nature. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara, and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of neuro-estrogens in social behavior in the most abundant vertebrate taxa. While estrogens are involved in the organization of the brain and behavior of some birds and rodents, neuro-estrogens appear to play an activational role in fish through a facilitatory action of androgen signaling.

      We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.

      Strengths:

      Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxa are more abundant and yet proportionally less studied than the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      Results obtained from multiple mutant lines converge to show that estrogen signaling drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) The new transgenic lines are under-characterized. There is no evaluation of the mRNA and protein products of Cyp19a1b and ESR2a.

      We did not directly assess the function of cyp19a1b and esr2a in our mutant fish. However, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly supports the loss of cyp19a1b function. This is stated in the Results section (line 97) as follows: “These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function.”

      Line 473: A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and Methods: “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (59). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”

      The following reference (#59), cited in the newly added text above, have been included in the reference list:

      D. Kayo, B. Zempo, S. Tomihara, Y. Oka, S. Kanda, Gene knockout analysis reveals essentiality of estrogen receptor β1 (Esr2a) for female reproduction in medaka. Sci. Rep. 9, 8868 (2019).

      (2) The stereotypic sequence of sexual behavior is poorly described, in particular, the part played by the two sexual partners, such that the conclusions are not easily understandable, notably with regards to the distinction between motivation and performance.

      Line 103: To provide a more detailed description of medaka mating behavior, we have revised the text from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning).”

      (3) The behavior of females is only assessed from the perspective of the male, which raises questions about the interpretation of the reduced behavior of the males.

      In medaka, female mating behavior is largely passive, except for rejecting courtship attempts and releasing eggs. Therefore, its analysis relies on measuring the latency to receive following, courtship displays, or wrappings from the male and the frequency of courtship rejection or wrapping refusal. We understand the reviewer’s perspective that cyp19a1b-deficient females might not be less receptive but instead less attractive to males, potentially leading to reduced male mating efforts. However, since these females are approached and followed by males at levels comparable to wild-type females, this possibility appears unlikely. Moreover, cyp19a1b-deficient females tend to avoid males and exhibit a slightly female-oriented sexual preference. While these traits are closely associated with reduced sexual receptivity, they do not readily align with reduced sexual attractiveness. Therefore, it is more plausible to conclude that these females have decreased receptivity rather than being less attractive to males.

      (4) At no point do the authors seem to consider that a reduced behavior of one sex could result from a reduced sensory perception from this sex or a reduced attractivity or sensory communication from the other sex.

      Line 112: As noted above, the impaired mating behavior of cyp19a1b-deficient females is unlikely to be due to reduced attractiveness to males. Similarly, mating behavior tests using esr2b-deficient females as stimulus females suggest that the impaired mating behavior of cyp19a1b-deficient males cannot be attributed to reduced attractiveness to females. However, the possibility that their impaired mating behavior could be attributed to altered cognition or sexual preference cannot be ruled out. To reflect this in the manuscript, we have revised the text “, suggesting that they are less motivated to mate” to “. These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (5) Aspects of the methods are not detailed enough to allow proper evaluation of their quality or replication of the data.

      In response to this and other specific comments from this reviewer, we have revised the Materials and Methods section to include more detailed descriptions of the methods.

      Line 469: The following text has been added to describe the method for domain identification in medaka Esr2a: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference list:

      S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      Line 540: The text “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to include additional details on the single ISH method as follows: “. The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      Line 596: The following text has been added to include additional details on the double ISH method: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (6) It seems very dangerous to use the response to a mutant abnormal behavior (ESR2-KO females) as a test, given that it is not clear what is the cause of the disrupted behavior.

      esr2b-deficient females have fully developed ovaries, a normal sex steroid milieu, and sexual attractiveness to males comparable to wild-type females, yet they are completely unreceptive to male courtship (Nishiike et al., 2021, Curr Biol, 1699–1710). Although, as the reviewer noted, the detailed mechanisms underlying this phenotype remain unclear, it is evident that the loss of estrogen/Esr2b signaling in the brain severely impairs sexual receptivity. Therefore, using esr2b-deficient females as stimulus females in the mating behavior test eliminates the influence of female sexual receptivity and male attractiveness to females, enabling the exclusive assessment of male mating motivation. This rationale is already presented in the Results section (lines 116–120), and we believe this experimental design offers a robust framework for assessing male mating motivation.

      Additionally, the mating behavior test with esr2b-deficient females complemented the test with wildtype females, and its results were not the sole basis for our discussion of the male mating behavior phenotype. The results of both tests were largely concordant, and we believe that the conclusions drawn from them are highly reliable.

      Meanwhile, in the test with esr2b-deficient females, cyp19a1b-deficient males were courted more frequently by these females than wild-type males. As the reviewer noted, this may suggest an anomaly in the test. Accordingly, we have confined our discussion to the possibility that “Perhaps cyp19a1b<sup>−/−</sup> males are misidentified as females by esr2b-deficient females because they are reluctant to court or they exhibit some female-like behavior” (line 131).

      (7) Most experiments are weakly powered (low sample size) and analyzed by multiple T-tests while 2 way ANOVA could have been used in several instances. No mention of T or F values, or degrees of freedom.

      Histological analysis was conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. As significant differences were detected in many analyses, further increasing the sample size is unnecessary.

      Although two-way ANOVA could be used instead of multiple T-tests for analyzing the data in Figures 4D, 4F, 6D, S4A, and S4B, we applied the Bonferroni–Dunn correction to control for multiple pairwise comparisons in multiple T-tests. As this comparison method is equivalent to the post hoc test following two-way ANOVA, the statistical results are identical regardless of whether T-tests or two-way ANOVA are used.

      For the data in Figures 4D, 4F, S4A, and S4B, the primary focus is on whether relative luciferase activity differs between E2-treated and untreated conditions for each mutant construct. Therefore, two-way ANOVA is not particularly relevant, as assessing the main effect of construct type or its interaction with E2 treatment does not provide meaningful insights. Similarly, in Figure 6D, the focus is solely on whether wild-type and mutant females differ in time spent at each distance. Given this, two-way ANOVA is unnecessary, as analyzing the main effect of distance is not meaningful.

      Accordingly, two-way ANOVA was not employed in this study, and therefore, its corresponding F values were not included. As the figure legends specify the sample sizes for all analyses, specifying degrees of freedom separately was deemed unnecessary.

      (8) The variability of the mRNA content for the same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (9) The discussion confuses the effects of estrogens on sexual differentiation (developmental programming = permanent) and activation (= reversible activation of brain circuits in adulthood) of the brain and behavior. Whether sex differences in the circuits underlying social behaviors exist is not clear.

      We recognize that the effects of adult steroids are sometimes not considered to be sexual differentiation, as they do not differentiate the neural substrate, but rather transiently activate the already masculinized or feminized substrate. Arnold (2017, J Neurosci Res 95:291–300) contends that all factors that cause sex differences, including the transient effects of adult steroids, should be incorporated into a theory of sexual differentiation, and indeed, these effects may be the most potent proximate factors that make males and females different. We concur with this perspective and have adopted it as a foundation for our manuscript.

      In teleosts, early developmental exposure to steroids has minimal impact, and sexual differentiation relies primarily on steroid action in adulthood (Okubo et al., 2022, Spectrum of Sex, pp. 111–133). This is evidenced by the effective reversal of sex-typical behaviors through experimental hormonal manipulation in adult teleosts and the absence of transient early-life steroid surges observed in mammals and birds. Accordingly, our discussion on brain sexual differentiation, including the statement in line 347, “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species”, remains well-supported. Additionally, given these considerations, while sex differences in neural circuit activation are evident in teleosts, substantial structural differences in these circuits are unlikely.

      (10) Overall, the claims regarding the activational role of neuro-estrogens on male sexual behavior are supported by converging evidence from multiple mutant lines. The role of neuroestrogens on gene expression in the brain is mostly solid too. The data for females are comparatively weaker. Conclusions regarding sexual differentiation should be considered carefully.

      We agree that the data for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when integrated with our prior findings, the data on females in this study provide significant insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors set out to answer an intriguing question regarding the hormonal control of innate social behaviors in medaka. Specifically, they wanted to test the effects of cyp19a1b mutation on mating and aggression in males. They also test these effects in females. Their approach takes them down several distinct experimental pathways, including one investigating how cyp19a1a function is related to androgen receptor expression and how estrogens themselves may act on the androgen receptor to modulate its expression, as well as how different esr genes may be involved. The study and its results are valuable and a clear, general conclusion of a pathway from brain aromatase>estrogens>esr genes> androgen receptor can be made. This is important, novel, and impactful. However, there are issues with how the study logic is set up, the approach for assessing certain behaviors, the statistics used, the interpretation of findings, and placing the findings in the proper context based on previous work, which manifests as a general issue where previous work is not properly attributed to.

      Thank you for your thoughtful review. We have carefully addressed each specific comment, as detailed below.

      Major comments:

      (1) The background for the rationale of the current study is misleading and lacks proper context. The authors root the logic of their experiment in determining whether estrogens regulate male-typical behaviors because the current assumption is androgens are "solely responsible" for male-typical behaviors in teleosts. This is not the case. Previous studies have shown aromatase/estrogens are involved in male-typical aggression in teleosts. For example, to name a couple:

      Huffman, L. S., O'Connell, L. A., & Hofmann, H. A. (2013). Aromatase regulates aggression in the African cichlid fish Astatotilapia burtoni. Physiology & behavior, 112, 77-83.

      O'Connell, L. A., & Hofmann, H. A. (2012). Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology, 153(3), 1341-1351.

      And even a recent paper sheds light on a possible AR>aromatase.estradiol hypothesis of male typical behaviors:

      Lopez, M. S., & Alward, B. A. (2024). Androgen receptor deficiency is associated with reduced aromatase expression in the ventromedial hypothalamus of male cichlids. Annals of the New York Academy of Sciences.

      Interestingly, the authors cite Hufmann et al in the discussion, so I don't understand why they make the claims they do about estrogens and male-typical behavior.

      Related to this, is an issue of proper attribution to published work. Indeed, missing are key references from lab groups using AR mutant teleosts. Here are a couple:

      Yong, L., Thet, Z., & Zhu, Y. (2017). Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology, 220(17), 3017-3021.

      Alward, B. A., Laud, V. A., Skalnik, C. J., York, R. A., Juntti, S. A., & Fernald, R. D. (2020). Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences, 117(45), 28167-28174.

      Ogino, Y., Ansai, S., Watanabe, E., Yasugi, M., Katayama, Y., Sakamoto, H., ... & Iguchi, T. (2023). Evolutionary differentiation of androgen receptor is responsible for sexual characteristic development in a teleost fish. Nature communications, 14(1), 1428.

      As noted in Response to reviewer #1’s comment 3 on weaknesses, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: The text “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      While Lopez and Alward (2024) provide valuable insights into the regulation of cyp19a1b expression by androgens, our study focuses specifically on the functional aspects of cyp19a1b. Expanding the discussion to include expression regulation would divert from the primary focus of our manuscript. For this reason, we have opted not to cite this reference.

      (2) As it is now, the authors are only citing a book chapter/review from their own group. This is a serious issue as it does not provide the proper context for the work. The authors need to fix their issues of attribution to previously published work and the proper interpretation of the work that they are aware of as it pertains to ideas proposed on the roles of androgens and estrogens in the control of male-typical behaviors. This is also important to get the citations right because the common use of "contrary to expectations" when describing their results is actually not correct. Many of the observations are expected to a degree. However, this doesn't take away from a generally stellar experimental design and mostly clear results. The authors do not need to rely on enhancing the impact of their paper by making false claims of unexpected findings. The depth and clarity of your findings are where the impact of your work is.

      As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      Additionally, as noted in Response to reviewer #1’s comment 4 on weaknesses, we have made the following revisions to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (3) The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used. An assay in which only male mutants are housed together? I do not understand the logic there and the logic for the approach isn't even explained. Too many confounds that are not controlled for. It makes it seem like an aspect of the study that was thrown in as an aside.

      As noted in Response to reviewer #1’s comment 5 on weaknesses, medaka form shoals and lack strong territoriality. As a result, even slight differences in dominance between the resident and intruder can substantially impact the outcomes of the resident-intruder test. Therefore, we adopted an alternative approach in this study.

      (4) Hormonal differences in the mutants seem to vary based on sex, and the meaning of these differences, or how they affect interpreting the findings, wasn't discussed. There was no acknowledegment of the fact that female central E2 was still at 50%, meaning the "rescue" experiments using peripheral injections are not given the proper context. For example, this is different than giving a fish with only 16% of their normal central E2 an E2 injection. Missing as well is a clear hypothesis for why E2 injections did not rescue aggression deficits in cyp19a1b mutant males.

      Line 385: As the reviewer pointed out, the degree of brain estrogen reduction in cyp19a1b-deficient fish differs greatly between males and females. This is likely because females receive a large supply of estrogens from the ovaries. Given that estrogen levels in cyp19a1b-deficient females were 50% of those in wild-type females, it can be inferred that half of their brain estrogens are synthesized locally, while the other half originates from the ovaries. This is an important finding, and we have already noted in the Discussion that “females have higher brain levels of estrogens, half of which are synthesized locally in the brain (i.e., neuroestrogens)” However, as this explanation was not sufficiently clear, we have revised it to “females have higher brain levels of estrogens, with half being synthesized locally and the other half supplied by the ovaries.”

      The reviewer raised a concern that conducting the estrogen rescue experiment in females, where 50% of brain estrogens remain, might be inappropriate. However, as this experiment was conducted exclusively in males, this concern is not applicable.

      Line 377: As noted in the reviewer’s subsequent comment, the failure of aggression recovery in E2treated cyp19a1b-deficient males could be due to insufficient induction of ara/arb expression in aggression-relevant brain regions. To address this concern, we have inserted the following statement into the Discussion after “the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry”: “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.”

      (5) In relation to that, the "null" results may have some of the most interesting implications, but they are barely discussed. For example, what does it mean that E2 didn't restore aggression in male cyp19 mutants? Is this a brain region factor? Could this relate to findings from Lopez et al NYAS, where male and female Ara mutants show different effects on brain-region-specific aromatase expression? And maybe this relates to the different impact of estrogens on ar expression. Were the different effects impacted in aggression areas? Maybe this is why E2 injection didn't retore aggression in males. You could make the argument that: (1) E2 doesn't restore ar expression in aggression regions and that's why there was no rescue. Or (2) that the circuits in adulthood that regulate aggression are NOT dependent on aggression but in early development they are. Another null finding not expanded on is why the two esr2a mutant lines showed differences. There is no reason to trust one line over the other, meaning we still don't know whether esr2a is required for latency to follow.

      As stated in our response to the previous comment, we have added the following text to the Discussion (line 377): “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.” Meanwhile, as discussed in lines 341–342, it is highly unlikely that the neural circuits regulating aggression are primarily influenced by early-life estrogen exposure, because androgen administration in adulthood alone is sufficient to induce high levels of aggression in both sexes. This notion is further supported by previous observations that cyp19a1b expression in the brain is minimal during embryonic development (Okubo et al., 2011, J Neuroendocrinol, 23:412–423).

      The findings of Lopez and Alward (2024) pertain to the regulation of cyp19a1b expression by androgen receptors. While this represents an important aspect of neuroendocrine regulation, it does not appear to be directly relevant to our discussion on cyp19a1b-mediated regulation of androgen receptor expression.

      To ensure the reliability of behavioral analyses in mutant fish, we consider a phenotype valid only when it is consistently observed in two independent mutant lines. In the mating behavior test examining esr2adeficient males using esr2b-deficient females as stimulus females, Δ8 line males exhibited a shorter latency to initiate following than wild-type males, whereas Δ4 line males did not. This discrepancy led us to refrain from drawing conclusions about the role of esr2a in mating behavior, even though the mating behavior test using wild-type females as stimulus females yielded consistent results in the Δ8 and Δ4 lines. Therefore, we do not consider the reviewer’s concern to be a significant issue.

      (6) Not sure what's going on with the statistics, but it is not appropriate here to treat a "control" group as special. All groups are "experimental" groups. There is nothing special about the control group in this context. all should be Bonferroni post-hoc tests.

      Line 619: As detailed in Response to reviewer #1’s comment 7 on weaknesses, we consider Dunnett’s test the most appropriate choice for the experiments presented in Figures 4C and 4E. We acknowledge that the reviewer’s concern may stem from the phrase “comparisons between control and experimental groups” in the Materials and Methods section. To clarify this point, we have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Minor comments:

      Line 47: then how can you say the aromatization hypothesis is "correct"? it only applies to a few species so far. Need to change the framing, not state so strongly such a vague thing as a hypothesis being "correct".

      Line 45: To address this concern, we have modified “widely accepted as correct” to “widely acknowledged”, ensuring a more precise characterization.

      Figure 1: looks like a dosage effect in males but not females. this should be discussed at some point, even if just to mention a dosage effect exists and put it in context.

      Line 91: We have revised the sentence “In males, brain E2 in heterozygotes (cyp19a1b+/−) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A)” by adding “, indicating a dosage effect of cyp19a1b mutation” to make this point explicit.

      Were male cyp19 KO aggressive towards females?

      We have not observed cyp19a1b-deficient males exhibiting aggressive behavior towards females in our experiments. Therefore, we do not consider them aggressive toward females.

      Please explain how infertility would lead to reduced mating.

      Line 142: As the reviewer has questioned, even if cyp19a1b-deficient males exhibit infertility due to efferent duct obstruction, it is difficult to imagine that this directly leads to reduced mating. However, the inability to release sperm could indirectly affect behavior. To address this, we have added “, possibly due to the perception of impaired sperm release” after “If this is also the case in medaka, the observed behavioral defects might be secondary to infertility.”

      Describe something about the timing of the treatment here. How can peripheral E2 injections restore it when peripheral levels are normal? Did these injections restore central levels? This needs to be shown experimentally.

      Line 517: As described in the Materials and Methods, E2 treatment was conducted by immersing fish in E2-containing water for 4 days. However, we had not explicitly stated that the water was changed daily to maintain the nominal concentration. To clarify this and address reviewer #2’s comment 9, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      Line 522: The treatment effectively restored mating activity and ara/arb expression in the brain, suggesting a sufficient increase in brain E2 levels. However, we did not measure the actual increase, and its extent remains uncertain. To reflect this in the manuscript, we have now added the following sentence: “Although the exact increase in brain E2 levels following E2 treatment was not quantified, the observed positive effects on behavior and gene expression suggest that it was sufficient.”

      I know the nomenclature differs among those who study teleosts, but it's ARa and then gene is ar1 (as an example; arb would be ar2). You're recommended the following citation to remain consistent:

      Munley, K. M., Hoadley, A. P., & Alward, B. A. (2023). A phylogenetics-based nomenclature system for steroid receptors in teleost fishes. General and Comparative Endocrinology, 114436.

      Paralogous genes resulting from the third round of whole-genome duplication in teleosts are typically designated by adding the suffixes “a” and “b” to their gene symbols. This convention also applies to the two androgen receptor genes, commonly referred to as ara and arb. While the alternative names ar1 and ar2 may gain broader acceptance in the future, ara and arb remain more widely used at present. Therefore, we have chosen to retain ara and arb in this manuscript.

      Line 268: how is this "suggesting" less aggression? They literally showed fewer aggressive displays, so it doesn't suggest it - it literally shows it.

      Line 285: Following this thoughtful suggestion, we have changed “suggesting less aggression” to “showing less aggression.”

      Line 317: how can you still call it the primary driver?

      The stimulatory effects of aromatase/estrogens on male-typical behaviors are exerted through the potentiation of androgen/AR signaling. Thus, we still believe that androgens—specifically 11KT in teleosts—serve as the primary drivers of these behaviors.

      Line 318: not all deficits, like aggression, were rescued.

      Line 334: To address this comment, “These behavioral deficits were rescued by estrogen administration, indicating that reduced levels of neuroestrogens are the primary cause of the observed phenotypes: in other words, neuroestrogens are pivotal for male-typical behaviors in teleosts” has been modified and now reads “Deficits in mating were rescued by estrogen administration, indicating that reduced brain estrogen levels are the primary cause of the observed mating impairment; in other words, brain-derived estrogens are pivotal at least for male-typical mating behaviors in teleosts.”

      Line 324: what do you mean by "sufficient"? To show that, you'd have to castrate the male and only give estrogen back. the authors continue to overstate virtually every aspect of their study, seemingly in an unnecessary manner.

      Line 341: Our intention was to convey that brain-derived estrogens early in life are not essential for the expression of male-typical behaviors in teleosts. However, we recognize that the term “sufficient” could be misinterpreted as implying that estrogens alone are adequate, without contributions from other factors such as androgens. To clarify this, we have revised the text from “neuroestrogen activity in adulthood is sufficient for the execution of male-typical behaviors, while that in early in life is not requisite. Thus, while” to “brain-derived estrogens early in life is not essential for the execution of male-typical behaviors. While.”

      Line 329: so? in adult mice, amygdala aromatase neurons still regulate aggression. The amount in adulthood seems less important compared to site-specific functions.

      Line 346: We do not intend to suggest that brain aromatase activity in adulthood plays a negligible role in male behaviors in rodents, as we have already acknowledged its necessity in the Introduction (lines 42–43). To enhance clarity and prevent misinterpretation, we have added “, although it remains important for male behavior in adulthood” to the end of the sentence: “brain aromatase activity in rodents reaches its peak during the perinatal period and thereafter declines with age.”

      Line 351: This contradicts what you all have been saying.

      Line 65: As mentioned in Response to reviewer #1’s comment 3 on weaknesses, the following text has been added to the Introduction: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)”, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015). With this revision, we believe the inconsistency has been addressed.

      Line 367: Additionally, we have revised the sentence from “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31–33)” to “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      Line 360: change to "...possibility that is not mutually exclusive,"

      Line 378: We have revised the phrase as suggested from “Another possibility, not mutually exclusive,” to “Another possibility that is not mutually exclusive.”

      Line 363: but it didn't rescue aggression

      Line 381: In response, we have revised the sentence from “This possibility is supported by the present observation that estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings” to “This possibility is at least likely for mating behavior, as estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings.”

      Line 367: on average

      To explain the sex differences in the role of aromatase, what about the downstream molecular or neural targets? In mammals, hodology is related to sex differences. there could be convergent sex differences in regulating the same type of behaviors as well.

      Our findings demonstrate that brain-derived estrogens promote the expression of ara, arb, and their downstream target genes vt and gal in males, while enhancing the expression of npba, a downstream target of Esr2b signaling, in females. The identity of additional target genes and their roles in specific neural circuits remain to be elucidated, and we aim to address these in future research.

      Lines 378-382: this doesn't logically follow. pgf2a could be the target of estrogens which in the intact animal do regulate female sexual receptivity. And how can you say this given that your lab has shown in esr2b mutants females don't mate?

      We agree that PGF2α signaling may be activated by estrogen signaling, as stated in lines 404–407: “the present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.” The observation that esr2b-deficient females do not accept male courtship is also stated in lines 401–403: “we recently challenged it by showing that female medaka deficient for esr2b are completely unreceptive to males, and thus estrogens play a critical role in female receptivity.”

      Line 396-397: or the remaining estrogens are enough to activate esr2b-dependent female-typical mating behaviors.

      We agree that cyp19a1b deficiency did not completely preclude female mating behavior, most likely because residual estrogens in the brains of cyp19a1b-deficient females enable weak activation of Esr2b signaling. However, the relevant section in the Discussion is not focused on examining why mating behavior persisted, but rather on considering the implications of this finding for the neural circuits regulating mating behavior. Therefore, incorporating the suggested explanation here would shift the focus and would not be appropriate.

      Line 420-421: this is a lot of variation. Was age controlled for?

      The time required for medaka to reach sexual maturity varies with rearing density and food availability. Due to space constraints, we adjust these parameters as needed, which led to variation in the ages of the experimental fish. However, since all experiments were conducted using sibling fish of the same age that had just reached sexual maturity, we believe this does not affect our conclusions.

      Line 457: have these kits been validated in medaka?

      Although we have not directly validated its applicability in medaka, its extensive use in this species suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      Line 589, re fish that spawned: how many times did this happen? Please note it is based on genotype and experiment. This could be important.

      Line 627: In response to this comment, we have added the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup>+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.”

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      (A1) The framing of neuroestrogens being important for male-typical rodents, and not for other vertebrate lineages, does not account for other groups (birds) in which this is true (the authors can consult their cited work by Balthazart (Reference 6) for extensive accounting of this). This makes the novelty clause in the abstract "indicating that neuro-estrogens are pivotal for male-typical behaviors even in nonrodents" less surprising and should be acknowledged by the authors by amending or omitting this novelty clause. The findings regarding androgen receptor transcription (next sentence) are more important and pertinent.

      Line 27: We recognize that the aromatization hypothesis applies to some birds, including zebra finches, as stated in the Introduction (lines 48–49) and Discussion (lines 432–433). However, this was not reflected in the Abstract. Following the reviewer’s suggestion, we have changed “in non-rodents” to “in teleosts.”

      (A2) The medaka line that has been engineered to have aromatase absent in the brain is presented briefly in the abstract, but can be misinterpreted as naturally occurring. This should be amended, by including something like "engineered" or "directed mutant" before 'male medaka fish'.

      Line 24: We have added “mutagenesis-derived” before “male medaka fish” in response to this comment.

      Introduction:

      (I1) The paragraph on teleost brain aromatase should acknowledge that while the capacity for estrogen synthesis in the brain is 100-1000 fold higher in teleosts as compared to rodents and other vertebrates, the majority of this derives from glial and not neural sources. This can be confusing for readers since the term 'neuroestrogens' often refers to the neuronal origin and signalling. And this observation includes the exclusive radial glial expression of cyp19a1b in medaka (Diotel et al., 2010), and first discovered in midshipman (Forlano et al., 2001), each of which should also be cited here. In addition, the authors expend much text comparing teleosts and rodents, but it is worth expanding these kinds of comparisons, especially by pointing out that parts of the primate brain are found to densely express aromatase (see work by Ei Terasawa and others).

      In response to this comment and a similar comment from reviewer #1, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” As a result of this addition, we have changed “This observation suggests” to “These observations suggest” in the subsequent sentence.

      Line 51: Additionally, to include information on aromatase in the primate brain, we have added the following text: “In primates, the hypothalamic aromatization of androgens to estrogens plays a central role in female gametogenesis (10) but is not essential for male behaviors (7, 8).”

      The following references (#10 and 18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      E. Terasawa, Neuroestradiol in regulation of GnRH release. Horm. Behav. 104, 138–145 (2018).

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (I2) It is difficult to resolve from the introduction and work cited how restricted cyp19a1b is to the medaka brain. Important for the results of this study, it is not clear whether it is more of a bias in the brain vs other tissues, or if the cyp19a1b deficiency is restricted to the brain, and gonadal/peripheral cyp19 expression persists. The authors need to improve their consideration of the alternatives, i.e., that this manipulation is not somehow affecting: 1) peripheral aromatase expression (either cyp19a1a or cyp19a1b) in the gonad or elsewhere, 2) compensatory processes, such as other steroidogenic genes (are androgen synthesizing enzymes increasing?).

      Our previous study demonstrated that cyp19a1b is expressed in the gonads, but at levels tens to hundreds of times lower than those in the brain (Okubo et al., 2011, J Neuroendocrinol 23:412–423). Additionally, a separate study in medaka reported that cyp19a1b expression in the ovary is considerably lower than that of cyp19a1a (Nakamoto et al., 2018, Mol Cell Endocrinol 460:104–122). Given these observations, any potential effect of cyp19a1b knockout on peripheral estrogen synthesis is likely negligible. Indeed, Figures S1C and S1D confirm that cyp19a1b knockout does not alter peripheral E2 levels.

      Line 72: To incorporate this information into the Introduction and address the following comment, we have added the following text: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      The following references (#26 and 27), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      K. Okubo, A. Takeuchi, R. Chaube, B. Paul-Prasanth, S. Kanda, Y. Oka, Y. Nagahama, Sex differences in aromatase gene expression in the medaka brain. J. Neuroendocrinol. 23, 412–423 (2011).

      M. Nakamoto, Y. Shibata, K. Ohno, T. Usami, Y. Kamei, Y. Taniguchi, T. Todo, T. Sakamoto, G. Young, P. Swanson, K. Naruse, Y. Nagahama, Ovarian aromatase loss-of-function mutant medaka undergo ovary degeneration and partial female-to-male sex reversal after puberty. Mol. Cell. Endocrinol. 460, 104–122 (2018).

      We have not assessed whether the expression of other steroidogenic enzymes is altered in cyp19a1bdeficient fish, and this may be investigated in future studies.

      (I3) Related, there are documented sex differences in the brain expression of cyp19a1b especially in adulthood (Okubo et al 2011) and this study should be cited here for context.

      Line 72: As stated in our previous response, we have cited Okubo et al. (2011) by adding the following sentence: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      Methods

      (M1) The rationale is unclear as presented for using mutagen screening for cype19a1b while using CRISPR for esr2a. Are there methodological/biochemical reasons why the authors chose to not use the same method for both?

      At the time we generated the cyp19a1b knockouts, genome editing was not yet available, and the TILLING-based screening was the only method for obtaining mutants in medaka. In contrast, by the time we generated the esr2a knockouts, CRISPR/Cas9 had become available, enabling a more efficient and convenient generation of knockout lines. This is why the two knockout lines were generated using different methods.

      (M2) Measurement of steroids in biological matrices is not straightforward, and it is good that the authors use multiple extraction steps (organic followed by C18 columns) before loading samples on the ELISA plates, which are notoriously sensitive. Even though these methods have been published before by this group of authors previously, the quality control and ELISA performance values (recovery, parallelism, etc.) should be presented for readers to evaluate.

      Thank you for appreciating our sample purification method. Unfortunately, we have not evaluated the recovery rate or parallelism, but we recognize this a subject for future studies.

      (M3) Mating behavior - E2 treated males were not co-housed with social partners for the full 24 hr before testing, but instead a few hours (?) prior to testing. The rationale for this should be spelled out explicitly.

      Line 494: In response to this comment, we have added “to ensure the efficacy of E2 treatment” to the end of the sentence “The set-up was modified for E2-treated males, which were kept on E2 treatment and not introduced to the test tanks until the day of testing.”

      (M4) The E2 treatment is listed as 1ng/ml vs. vehicle (ethanol). Is the E2 dissolved in 100% ethanol for administration to the tank water? Clarification is needed.

      Line 517: As the reviewer correctly assumed, E2 was first dissolved in 100% ethanol before being added to the tank water. To provide this information and address reviewer #1’s minor comment 5, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      (M5) The authors exclude fish from the analysis of courtship display behavior for those individuals that spawned immediately at the start of the testing (and therefore it was impossible to register courtship display behaviors). How often did fish in the various treatment groups exhibit this "fast spawning" behavior? Was the occurrence rate different by treatment group? It is unlikely that these omissions from the data set drove large-scale patterns, but an indication of how often this occurred would be reassuring.

      Line 627: In response to this comment, we have included the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b+/+, 3/10 cyp19a1b+/−, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.” These data indicate that the proportion of excluded males is nearly constant within each trial and is independent of the genotype of the focal fish.

      Results

      (R1) It is striking to see the genetic-'dose' dependent suppression of brain E2 content by heterozygous and homozygous cyp19a1b deficiency, indicating that, as the authors point out, the majority of E2 in the male medaka brain (and 1/2 in the female brain) have a brain-derived origin. It is important also for the interpretation that there are large compensatory increases in brain levels of androgens, when E2 levels drop in the cyp19a1b mutant homozygotes. This latter point should receive more attention.

      Also, there are large increases in peripheral androgen levels in the homozygote mutants for cyp19a1b in both males and females. This indicates a peripheral effect in addition to the clear brain knockdown of E2 synthesis. These nuances need to be addressed.

      In response to this comment, we have revised the Results section as follows:

      Line 91: “, indicating a dosage effect of cyp19a1b mutation” has been added to the end of the sentence “In males, brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A).”

      Line 94: To draw more attention to the increase in brain androgen levels caused by cyp19a1b deficiency, “Brain levels of testosterone” has been modified to “Strikingly, brain levels of testosterone.”

      Line 100: “Their peripheral 11KT levels also increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D)” has been modified and now reads “In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.”

      (R2) The interpretation on page 4 that cyp19a1b deficient males are 'less motivated' to mate is premature, given the behavioral measures used in this study. There are several competing explanations for these findings (e.g., alterations in motivation, sensory discrimination, preference, etc.) that could be followed up in future work, but the current results are not able to distinguish among these possibilities.

      Line 112: We agree that the possibility of altered cognition or sexual preference cannot be dismissed. To incorporate this perspective, we have revised the text “, suggesting that they are less motivated to mate” to “These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (R3) On page 5, the authors present that peripheral E2 manipulation (delivery to the fish tank) restores courtship behavior in males, and then go on to erroneously conclude that this demonstrates "that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior." Because this is a peripheral E2 treatment, there can be manifold effects on gonadal physiology or other endocrine events that can have indirect effects on the brain and behavior. Without manipulation of E2 directly to the brain to 'rescue' the cyp19a1b deficiency, the authors cannot conclude that these effects are directly on the central nervous system. Tellingly, the tank E2 treatment did not rescue aggressive behavior, suggestive of the potential for indirect effects.

      Line 155: As detailed in Response to reviewer #2’s specific comment 1, we have revised the text from “These results demonstrated that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior. In contrast” to “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (R4) The downregulation of androgen-dependent gene expression (vasotocin in pNVT and galanin in pPMp) in the cyp19a1b deficient males (Figure 3) could be due to exceedingly high levels of brain androgens in the cyp19a1b deficient males. The best way to test the idea that estrogens can restore the expression to be more wild-type directly (like what is happening for ara and arb) is to look at these same markers (vasotocin and galanin) in these same brain areas in the brains of E2-treated males. The authors should have these brains from Figure 2. Unless I missed something, those experiments were not performed/reported here. It is clear that the ara and arb receptors have EREs and are 'rescued' by E2 treatment, but in principle, there could be indirect actions for reasons stated above for the behavior due to the peripheral E2 tank application.

      Thank you for your insightful comment. We agree that the current results cannot exclude the possibility that excessive androgen levels caused the downregulation of vt and gal. However, our previous studies showed that excessive 11KT administration to gonadectomized males and females increased the expression of these genes to levels comparable to wild-type males (Yamashita et al., 2020, eLife, 9:e59470; Kawabata-Sakata et al., 2024, Mol Cell Endocrinol 580:112101), making this scenario unlikely. That said, testing whether estrogen treatment restores vt and gal expression in cyp19a1bdeficient males would be informative, and we see this as an important direction for future research.

      Discussion

      (D1) The authors need to clarify whether EREs are found in other vertebrate AR introns, or is this unique to the teleost genome duplication?

      We have identified multiple ERE-like sequences within intron 1 of the mouse AR gene. However, sequence data alone do not provide sufficient evidence of their functionality, rendering this information of limited relevance. Therefore, we have chosen not to include this discussion in the current paper.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors are strongly encouraged to report information regarding the effect of Cyp19a1b deletion on the brain content of aromatase protein (ideally both isoforms investigated separately) as the two isoforms are mostly but not completely brain vs gonad specific. The analysis of other tissues would also strengthen the characterization of this model.

      We agree that measuring aromatase protein levels in the brain of our fish would be valuable for confirming the loss of cyp19a1b function. However, as no suitable method is currently available, this issue will need to be addressed in future studies. While this constitutes indirect evidence, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly suggests the loss of cyp19a1b function, as noted in Response to reviewer #3’s comment 1 on weaknesses.

      (2) As presented, this study reads as niche work. A better description of the behavior and reproductive significance of the different aspects of the behavioral sequence would allow a better understanding of the results and would thus allow the non-specialist to appreciate the significance of the observations.

      Line 103: In response to this comment and Reviewer #3’s comment 2 on weaknesses, we have revised the sentence from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning)” in order to provide a more detailed description of medaka mating behavior.

      (3) The data regarding female behavior are limited and incomplete. It is suggested to keep this for another manuscript unless data on the behavior of the female herself is added. Indeed, analyzing female's behavior from the male's perspective complicates the interpretation of the results while a description of what the females do would provide valuable and interpretable information.

      We thank the reviewer for this thoughtful suggestion and agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (4) In Figure 2, the validity to run multiple T-tests rather than a two-way ANOVA comparing TRT and genotype is questionable. Moreover, why are the absolute values in CTL higher than in the initial experiment comparing genotypes for ara in PPa, pPPp, and NVT as well as for arb in aPPp. More importantly, these graphs do not seem to reproduce the genotype effects for ara in pPPp and NVT and for arb in aPPp.

      The data in Figures 2J and 2K were analyzed with an exclusive focus on the difference between vehicletreated and E2-treated males, without considering genotype differences. Therefore, the use of T-tests for significance testing is appropriate.

      As the reviewer noted, the overall ara expression area is larger in Figure 2J than in Figure 2F. However, as detailed in Response to reviewer #3’s comment 8 on weaknesses, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, we consider this difference unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear smaller in Figures 2J and 2K compared to Figures 2F and 2H. This is likely due to the smaller sample size used in the experiments for Figures 2J and 2K, which makes the differences less distinct. However, since the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (5) More information is required regarding the analysis of single ISH - How was the positive signal selected from the background in the single ISH analyses? How was this measure standardized across animals? How many sections were imaged per region? Do the values represent unilateral or bilateral analysis?

      Line 540: Following this comment, we have provided additional details on the single ISH method in the manuscript. Specifically, “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to “The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      (6) More information should be provided in the methods regarding the image analysis of double ISH. In particular, what were the criteria to consider a cell as labeled are not clear. This is not clear either from the representative images.

      Line 596: To provide additional details on the single ISH method in the manuscript, we have added the following sentence: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (7) There is no description of the in silico analyses run on ESR2a in the methods.

      The method for identifying estrogen-responsive element-like sequences in the esr2a locus is described in line 549: “Each nucleotide sequence of the 5′-flanking region of ara and arb was retrieved from the Ensembl medaka genome assembly and analyzed for potential canonical ERE-like sequences using Jaspar (version 5.0_alpha) and Match (public version 1.0) with default settings.”

      However, the method for domain identification in Esr2a was not described. Therefore, we have added the following text in line 469: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference: S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      (8) Information about the validation steps of the EIA that were carried out as well as the specificity of the antibody the steroids and the extraction efficacy should be provided.

      We have not directly validated the applicability of the EIA kit, but its extensive use in medaka suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      The specificity (cross-reactivity) of the antibodies is detailed as follows.

      (1) Estradiol ELISA kits: estradiol, 100%; estrone, 1.38%; estriol, 1.0%; 5α-dihydrotestosterone, 0.04%; androstenediol, 0.03%; testosterone, 0.03%; aldosterone, <0.01%; cortisol, <0.01%; progesterone, <0.01%.

      (2) Testosterone ELISA kits: testosterone, 100%; 5α-dihydrotestosterone, 27.4%; androstenedione, 3.7%; 11-ketotestosterone, 2.2%; androstenediol, 0.51%; progesterone, 0.14%; androsterone, 0.05%; estradiol, <0.01%.

      (3) 11-Keto Testosterone ELISA kits: 11-ketotestosterone, 100%; adrenosterone, 2.9%; testosterone, <0.01%.

      As this information is publicly available on the manufacturer’s website, we deemed it unnecessary to include it in the manuscript.

      Unfortunately, we have not evaluated the extraction efficacy of the samples, but we recognize this a subject for future studies.

      (9) I wonder whether the evaluation of the impact of the mutation by comparing the behavior of a group of wild-type males to a group of mutated males is the most appropriate. Justifying this approach against testing the behavior of one mutated male facing one or several wild-type males would be appreciated.

      We agree that the resident-intruder test, in which a single focal resident is confronted with one or more stimulus intruders, is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (10) Lines 329-331: this sentence should be rephrased as it contributes to the confusion between sexual differentiation and activation of circuits. The restoration of sexual behavior by adult estrogen treatment pleads in favor of an activational role of neuro-estrogens on behavior rather than an organizational role. Therefore, referring to sexual differentiation is misleading, even more so that the study never compares sexes.

      As detailed in Response to reviewer #3’s comment 9 on weaknesses, we consider that all factors that cause sex differences, including the transient effects of adult steroids, need to be incorporated into a theory of sexual differentiation. In teleosts, since steroids during early development have little effect and sexual differentiation primarily relies on steroid action in adulthood, our discussion on brain sexual differentiation remains valid, including the statement in line 347: “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species.”

      (11) Lines 384-386: I may have missed something but I do not see data supporting the notion that neuroestrogens may function upstream of PGF2a signaling to mediate female receptivity.

      Line 403: We acknowledge that our explanation was insufficient and apologize for any confusion. To clarify this point, “Given that estrogen/Esr2b signaling feminizes the neural substrates that mediate mating behavior, while PGF2α signaling triggers female sexual receptivity,” has been added before the sentence “The present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.”

      Additional alteration

      Reference list (line 682): a preprint article has now been published in a peer-reviewed journal, and the information has been updated accordingly as follows: “bioRxiv doi: 10.1101/2024.01.10.574747 (2024)” to “Proc. Natl. Acad. Sci. U.S.A. 121, e2316459121 (2024).”

    1. Author response:

      eLife Assessment

      Alignment and sequencing errors are a major concern in molecular evolution, and this valuable study represents a welcome improvement for genome-wide scans of positive selection. This new method seems to perform well and is generally convincing, although the evidence could be made more direct and more complete through additional simulations to determine the extent to which alignment errors are being properly captured.

      We thank the editors for their positive assessment and for highlighting the core strength and a key area for improvement. The main request (also echoed by both reviewers) is for us to conduct additional simulation studies where true alignment errors are known and assess the performance of BUSTED-E. We plan to conduct several simulations (on the order of 100,000 individual alignments in total) in response to that request, with the caveat that we are not aware of any tools that simulate realistic alignment errors, so these simulations are likely only a pale reflection of biological reality.

      (1) Ad hoc small local edits of alignments similar to what was implemented in the HMMCleaner paper. These local edits would include operations like replacement of codons or small stretches of sequences with random data, local transposition, inversion.

      (a) Using parametrically simulated alignments (under BUSTED models).

      (b) Using empirical alignments.

      (2) Simulations under model misspecification, specifically to address the point of reviewer 2. For example, we would simulate under models that allow for multi-nucleotide substitutions, and then apply error filtering under models which do not.

      We will also run several new large-scale screens of existing alignments, to directly and indirectly address the reviewers comments. These will include

      (a) A drosophila dataset (from https://academic.oup.com/mbe/article/42/4/msaf068/8092905)

      (b) Current Selectome data (https://selectome.org/), both filtered and unfiltered. Here the filtering procedure refers to what Selectome does to obtain what its authors think are high quality alignments.

      (c) Current OrthoMam data, both (https://orthomam.mbb.cnrs.fr/) filtered and unfiltered. Here the filtering procedure refers to what OrthoMam does to obtain what its authors think are high quality alignments.

      Reviewer #1:

      We are grateful to Reviewer #1 for their positive and encouraging review. We are pleased they found our analyses convincing and recognized BUSTED-E as a "simple, efficient, and computationally fast" improvement for evolutionary scans.

      Strengths:

      As a side note, I found it particularly interesting how the authors tested the statistical support for the new method compared to the simpler version without the error class. In many cases, the simpler model could not be statistically rejected in favor of the more complex model, despite producing biologically incorrect results in terms of parameter inference. This highlights a broader issue in molecular evolution and phylogenomics, where model selection often relies too heavily on statistical tests, potentially at the expense of biological realism.

      We agree that this observation touches upon a critical issue in phylogenomics. A statistically "good" fit does not always equate to a biologically accurate model. We believe our work serves as a useful case study in this regard. We will add discussion of the importance of considering biological realism alongside statistical adequacy in model selection.

      Weaknesses:

      Regarding the structure of the manuscript, the text could be clearer and more precise.

      We appreciate this feedback. We will perform a thorough revision of the entire manuscript to improve its clarity, flow, and precision. We will focus on streamlining the language and ensuring that our methodological descriptions and results are as unambiguous as possible.

      Clear, practical recommendations for users could also be provided in the Results section.

      To make our method more accessible and its application more straightforward, we will add a new section that provides clear, practical recommendations for users. This includes guidance on when to apply BUSTED-E, how to interpret its output, and best practices for distinguishing potential errors from strong selection.

      Additionally, the simulation analyses could be further developed to include scenarios with both alignment errors and positive selection, in order to better assess the method's performance.

      Additional simulations will be conducted (see above)

      Finally, the model is evaluated only in the context of site models, whereas the widely used branch-site model is mentioned as possible but not assessed.

      BUSTED class models support branch-site variation in dN/dS, so technically all of our analyses are already branch-site. However, we interpret the reviewer’s comment as describing use cases when a method is used to test for selection on a subset of tree branches (as opposed to the entire tree). BUSTED-E already supports this ability, and we will add a section in the manuscript describing how this type of testing can be done, including examples. However, we do not plan to conduct additional extensive data analyses or simulations, as this would probably bloat the manuscript too much.

      Reviewer #2:

      We thank Reviewer #2 for their detailed and thought-provoking comments, and for their enthusiasm for modeling alignment issues directly within the codon modeling framework. The criticisms raised are challenging and we will work on improving the justification, testing, and contextualization of our method.

      Weaknesses:

      The definition of alignment error by a very large ω is not justified anywhere in the paper... I would suggest characterising a more specific error model. E.g., radical amino-acid "changes" clustered close together in the sequence, proximity to gaps in the alignment, correlation of apparent ω with genome quality... Also concerning this high ω, how sensitive is its detection to computational convergence issues?

      This is a fundamental point that we are grateful to have the opportunity to clarify. Our intention with the high ω category is not to provide a mechanistic or biological definition of an alignment error. Rather, its purpose is to serve as a statistical "sink" for codons exhibiting patterns of divergence so extreme that they are unlikely to have resulted from a typical selective process. It is phenomenological and ad hoc. The reviewer makes sensible suggestions for other ad hoc/empirical approaches to alignment quality filtering, but most of those have already been implemented in existing (excellent) alignment filtering tools. BUSTED-E is never meant to replace them, but rather to catch what is left over. Importantly, error detection is not even the primary goal of BUSTED-E; errors are treated as a statistical nuisance. With all due respect, all of the reviewers suggestions are similarly ad hoc -- there is no rigorous quantitative justification for any of them, but they are all sensible and plausible, and usually work in practice.

      Computational convergence issues can never be fully dismissed, but we do not consider this to be a major issue. Our approach already pays careful attention to proper initialization, does convergence checks, considers multiple initial starting points. We also don’t need to estimate large ω with any degree of precision, it just needs to be “large”.

      The authors should clarify the relation between the "primary filter for gross or large-scale errors" and the "secondary filter" (this method). Which sources of error are expected to be captured by the two scales of filters?

      We will add discussion and examples to explicitly define the distinct and complementary roles of these filtering stages.

      The benchmarking of the method could be improved both for real and simulated data... I suggest comparing results with e.g. Drosophila genomes... For simulations, the authors should present simulations with or without alignment errors... and with or without positive selection... I also recommend simulating under more complex models, such as multinucleotide mutations or strong GC bias...

      We will add more simulations as suggested (see above). We will also analyze a drosophila gene alignment from previously published papers.

      It would be interesting to compare to results from the widely used filtering tool GUIDANCE, as well as to the Selectome database pipeline... Moreover, the inconsistency between BUSTED-E and HMMCleaner, and BMGE is worrying and should be better explained.

      Some of the alignments we have analyzed had already been filtered by GUIDANCE. We’ll also run the Selectome data through BUSTED-E: both filtered and unfiltered. We consider it beyond the scope of this manuscript to conduct detailed filtering pipeline instrumentation and side-by-side comparison.

      For a new method such as this, I would like to see p-value distributions and q-q plots, to verify how unbiased the method is, and how well the chi-2 distribution captures the statistical value.

      We will report these values for new null simulations.

      I disagree with the motivation expressed at the beginning of the Discussion... Our goal should not be to find a few impressive results, but to measure accurately natural selection, whether it is frequent or rare.

      That’s a philosophical point; at some level, given enough time, every single gene likely experiences some positive selection at some point in the evolutionary past. The practically important question is how to improve the sensitivity of the methods while controlling for ubiquitous noise. We do agree with the sentiment that the ultimate goal is to “measure accurately natural selection, whether it is frequent or rare”. However, we also must be pragmatic about what is possible with dN/dS methods on available genomic data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. It is clear from the reviews that we’ve had the privilege to have our work extensively and thoroughly checked by knowledgeable experts, for which we are very grateful. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have included detailed responses below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: Yield data has been collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases this resulted in variations in how data were collected or processed (e.g. taxonomic level of species identification). We have added more details to the sections on sampling design and data analysis to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This gives better insight in the variation of responses among ground beetle taxa.

      (4) Restrict findings to our system: We nuanced our findings further and focused more on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      Below we also respond to the editor and reviewers in more detail.

      Reviewing Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or cite now (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We revised the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We revised the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we included a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analyses, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were often made up of several species). We illustrate these analyses still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our reasoning about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and have carefully checked the text to avoid overstatements.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight variations in how data were collected or processed (e.g. taxonomic level of species identification).

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia & van Apeldoorn, 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data (line 203-207, 217-221).

      Reviwer #1 (Recommendations for the authors):

      This is a well-written study on the effects of strip cropping on ground-beetle diversity. As stated above the study is well analyzed, presented, and written but you should not pretend that you analyzed yield e.g. lines 25-27 "We show that strip cropping...enhance ground beetle biodiversity without incurring major yield loss.

      We understand the confusion caused by this sentence, and it was never our intention to give the impression that we analyzed yield losses. These findings were based on previous research by ourselves and colleagues, and we have now changed the sentence to reflect this (line 25-27).

      I think you assume that yield does not differ between strip cropping and monoculture. I am not sure this is correct as one crop might attract pests or predators spilling over to the other crop. I am also not sure if the sowing and harvest of the crop will come with the same costs. So if you assume this, you should only do it in the main manuscript and not the abstract, to justify this better.

      With three peer-reviewed papers on the same fields as we studied, we can convincingly state that strip cropping in organic agriculture generally does not result in major yield loss, although exceptions exist, which we refer to in the discussion.

      In the introduction lines 28-43, you refer to insect biomass decline. I wonder if you would like to add the study of Loboda et al. 2017 in Ecography. It seems not fitting as it is from the Artic but also the other studies you cite are not only coming from agricultural landscapes and this study is from the same time as the Hallmann et al. 2017 study and shows a decline in flies of 80%

      We have removed the sentence that this comment refers to, to streamline the introduction more.

      Lines 50-51. You only have one citation for biodiversity strategies in agricultural systems. I suggest citing Mupepele et al. 2021 in TREE. This study refers to management but also the policies and societal pressures behind it.

      We have added this citation and a recent paper by Cozim-Melges et al. (2024) here (line 49-52).

      In the methods, I am missing a section on species identifications. This would help to understand why you used "taxonomic richness".

      Thanks for pointing this out. We have now included a new section on ground beetle identification (line 304-309 in methods).

      Figure 1 is great and I like that you separated the field and crop-level data, although there is no statistical power for the crop-specific data. I personally would move k to the supplements. It is very detailed and small and therefore hard to read

      We chose to keep figure 1k, as in our view it gives a good impression of the scale of the experiment, the number of crops included and the absolute numbers of caught species.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      We did not intend to investigate the effect of strip cropping on crop yields, but rather place our work in the framework of earlier studies that already studied yield. All the monocultures and strip cropping fields were organic farms. Our findings thus compare crop diversity effects within the context of organic farming.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we have explained this more clearly (line 297-300).

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields. We present a more detailed description of the sampling design in the methods of the revised manuscript (line 294-297).

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      To be clear, this limitation relates to the comparison that we did for the community compositions of ground beetles in two crops either in strip cropping or monocultures. In this case, it was impossible to avoid potential autocorrelation due to our field design. We also acknowledge this limitation in the results section (line 130-133). However, for our other analyses we corrected for spatial autocorrelation by including variables per location, year and crop. This grouped samples that were spatially autocorrelated. Therefore, we don’t see this as a discrepancy of our other analyses.

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      We agree and acknowledge that crop type can influence carabid richness and density, which is why we have included variables to account for differences caused by crops. However, we did not observe consistent differences between crops in how strip cropping affected ground beetle richness and density. Therefore, we don’t think that crop types would have influenced our conclusions on the overall effect of strip cropping.

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We have clarified this further in the revised manuscript (line 294-301, 318-322). As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. This approach allowed us to get a good impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we further clarified this point (line 365-366, 373-374).

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we nuanced our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields (line 33-35).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points stated under 'Weaknesses' above, I provide smaller comments and recommendations:

      Overall comments:

      (i) The carabid images used in the figures were created by Ortwin Bleich and are copyrighted. I could not find him accredited in the acknowledgements; the figure legends simply state that the images were taken from his webpage. Was his permission obtained? This should be stated.

      We have received written permission from Ortwin Bleich for using his pictures in our figures, and have accredited him for this in the acknowledgements (line 455-456).

      (ii) There is a great confusion in the field concerning terminology. The authors here use intercropping and strip cropping, a specific form of intercropping, interchangeably. I advise the authors to stick to strip cropping as it is more precise and avoids confusion with other forms of intercropping.

      We agree with the definitions given by reviewer 2 and had already used them as such in the text. We defined strip cropping in the first paragraph of the introduction and do not use the term “intercropping” after this definition to avoid confusion.

      Comments to specific lines:

      Line 19: While this is likely true, there is so far not enough compelling evidence for such a strong statement blaming agriculture. Please rephrase.

      Changed the sentence to indicate more clearly that it is one of the major drivers, but that the “blame” is not solely on agriculture (line 18-19).

      Line 22: Is this the case? I am aware of strip cropping being used in other countries, many of them in Europe. Why the focus on 'Dutch'?

      Indeed, strip cropping is now being pioneered by farmers throughout Europe. However in the Netherlands, some farmers have been pioneering strip cropping already since 2014. We have added this information to indicate that our setting is in the Netherlands, and as in our opinion it gives a bit more context to our manuscript.

      Line 24: I would argue that carabids are actually not good indicators for overall biodiversity in crop fields as they respond in a very specific way, contrasting with other taxa. It is commonly observed that carabids prefer more disturbed habitats and richness often increases with management intensity and in more agriculturally dominated landscapes - in stark contrast to other taxa like wild bees or butterflies.

      We have reworded this sentence to reflect that they are not necessarily indicators of wide agricultural biodiversity, but that they do hold keystone positions within food webs in agricultural systems (line 23-25).

      Line 31: This statement here is also too strong - carabids are not overall biodiversity and patterns found for carabids likely differ strongly from patterns that would be observed in other taxa. This study is on carabids and the conclusion should thus also refer to these in order to avoid such over-simplified generalizations.

      We agree and have nuanced this sentence to indicate that our findings are only on ground beetles (line 33-35). However, we would like to point out that the statement that “patterns found for carabids likely differ strongly from patterns that would be observed in other taxa” assumes a disassociation between carabids and other taxa.

      Line 41: I am sure the authors are aware of the various methodological shortcomings of the dataset used in Hallmann et al. (2017) which likely led to an overestimation of the actual decline. Analysing the same data, Müller et al. (2023) found that weather can explain fluctuations in biomass just as well as time. I thus advise not putting too much focus on these results here as they seem questionable.

      We have removed this sentence to streamline the introduction, thus no longer mentioning the percentages given by Hallmann et al. (2017).

      Line 46: Surely likely but to my knowledge this is actually remarkably hard to prove. Instead of using the IPBES report here that simply states this as a fact, it would be better to see some actual evidence referenced.

      We removed IPBES as a source and changed this for Dirzo et al. (2014), a review that shows the consequences of biodiversity decline on a range of different ecosystem services and ecological functions (line 45-47).

      Line 52ff: I am not sure whether this old land-sparing vs. land-sharing debate is necessary here. The authors could simply skip it and directly refer to the need of agricultural areas, the dominating land-use in many regions, to become more biodiversity-friendly. It can be linked directly to Line 61 in my opinion which would result in a more concise and arguably stronger introduction.

      After reconsidering, we agree with reviewer 2 that this section was redundant and we have removed the lines on land-sparing vs land-sharing.

      Line 59: Just a note here: this argument is not meaningful when talking about strip cropping in the Netherlands as there is virtually no land left that could be converted (if anything, agricultural land is lost to construction). The debate on land-use change towards agriculture is nowadays mostly focused on the tropics and the Global South.

      We argue that strip cropping could play an important role as a measure that does not necessarily follow the trade-off between biodiversity and agriculture for a context beyond the Netherlands (line 52-58).

      Line 69: Does this statement really need 8 references?

      Line 71: ... and this one 5 additional ones?

      We have removed excess references in these two lines (line 62-66).

      Line 74: But also likely provides the necessary crop continuity for many crop pests - the authors should keep in mind that when practitioners read agricultural biodiversity, they predominantly think of weeds and insect pests.

      We agree with reviewer 2 that agricultural biodiversity is still a controversial topic. However, as the focus in this manuscript is more on biodiversity conservation, rather than pest management, we prefer to keep this sentence as is. In other published papers and future work we focus more on the role of strip cropping for pest management.

      Line 83: Consider replacing 'moments' maybe - phenological stages or development stages?

      Although we understand the point of reviewer 2, we prefer to keep it at moments, as we did not focus on phenological stages and we only wanted to say that we set pitfall traps at several moments throughout the year. However, by placing the pitfall traps at several moments throughout the year, we did capture several phenological stages.

      Line 86: Not only farming practices - there are also massive fluctuations between years in the same crop with the same management due to effects of the weather in the previous reproductive season. Interpreting carabid assemblage changes is therefore not straightforward.

      We absolutely agree that interpreting carabid assemblage is not straightforward, but as we did not study year or crop legacy effects we chose to keep this sentence to maintain focus on our research goals.

      Line 88: 'ecolocal'?

      Typo, should have been ecological. Changed (line 81).

      Line 90: 'As such, they are often used as indicator group for wider insect diversity in agroecosystems' - this is the third repetition of this statement and the second one in this paragraph - please remove. Having worked on carabids extensively myself, I also think that this is not the true reason - they are simply easy to collect passively.

      We agree with the reviewer and have removed this sentence.

      Line 141: I have doubts about the value of the ISA looking at the results. Anchomenus dorsalis is a species extremely common in cereal monoculture fields in large parts of Europe, especially in warmer and drier conditions (H. griseus was likely only returned as it is generally rare and likely only occurred in few plots that, by chance, were strip-cropped). It can hardly be considered an indicator for diverse cropping systems but it was returned as one here (which I do not doubt). This often happens with ISA in my experience as they are very sensitive to the specific context of the data they are run on. The returned species are, however, often not really useable as indicators in other contexts. I thus believe they actually have very limited value. Apart from this, we see here that both monocultures and strip cropping have their indicators, as would likely all crop types. I wonder what message we would draw from this ...

      On close reconsideration, we agree with the reviewer that the ISAs might have been too sensitive to rare species that by chance occur in one of two crop configurations. To still get an idea on what happens with specific ground beetle groups, we chose to replace the ISAs with analyses on the 12 most common ground beetle genera. For this purpose we have added new sections to the methods (line 368-374) and results (line 135-143), replaced figure 2 and table S5, and updated the discussion (line 182-200).

      Line 165: Carabid activity is high when carabids are more active. Carabids can be more active either when (i) there are simply more carabid individuals or /and (ii) when they are starved and need to search more for prey. More carabid activity does thus not necessarily indicate more individuals, it can indicate that there is less prey. This aspect is missing here and should be discussed. It is also not true that crop diversification always increases prey biomass - especially strip cropping has previously been shown to decrease pest densities (Alarcón-Segura et al., 2022). Of course, this is a chicken-egg problem (less pests => less carabids or more carabids => less pests ?) ... this should at least be discussed.

      We have rewritten this paragraph to further discuss activity density in relation to food availability (line 175-185).

      Line 178: These species are not exclusively granivorous - this speculation may be too strong here.

      Line 185: true for all but C. melanocephalus - this species is usually more associated with hedgerows, forests etc.

      After removing the ISA’s, we also chose to remove this paragraph and replace it with a paragraph that is linked to the analyses on the 12 most common genera (line 182-200).

      Line 202: These statements are too strong for my taste - the authors should add an 'on average' here. The data show that they likely do not always enhance richness by 15 % and as the authors state, some monocultures still had higher richness and densities.

      “on average” added (line 211)

      Line 203: 'can lead' - the authors cannot tell based on their results if this is always true for all taxa.

      Changed to “can lead” (line 213)

      Line 205: What is 'diversification' here?

      This concerns measures like hedgerows or flower strips. We altered the sentence to make this clearer (line 215-216).

      Line 208: Does this statement need 5 references? (as in the introduction, the reader gets the impression the authors aimed to increase the citation count of other articles here).

      We have removed excess references (line 219-221).

      Line 222: How many are 'a few'? Maybe state a proportion.

      We only found two species, we’ve changed the sentence accordingly (line 232-233).

      Line 224: As stated above, I would not overstress the results of the ISAs - the authors stated themselves that the result for A. dorsalis is likely only based on one site ...

      We removed this sentence after removing the ISAs.

      Line 305: I think there is an additional nested random level missing - the transect or individual plot the traps were located in (or was there only one replicate for each crop/strip in each experiment)? Hard to tell as the authors provide no information on the actual sample sizes.

      Indeed, there was one field or plot per cropping system per crop per location per year from which all the samples were taken. Therefore the analysis does not miss a nested random level. We provided information on sample sizes in Table S7.

      Line 314ff: The authors describe that they basically followed a (slightly extended) Chao-Hill approach (species richness, Shannon entropy & inverse Simpson) without the sampling effort / sample completeness standardization implemented in this approach and as a reader I wonder why they did not simply just use the customary Chao-Hill approach.

      We were not aware of the Chao-Hill approach, and we see it as a compliment that we independently came up with an approach similar to a now accepted approach.

      Line 329: Unclear what was nested in what here - location / year / crop or year / location / crop ?

      For the crop-level analyses, the nested structure was location > year > crop. This nested structure was chosen as every location was sampled across different years and (for some locations) the crops differed among years. However, as we pooled the samples from the same field in the field-level analyses, using the same random structure would have resulted in each individual sampling unit being distinguished as a group. Therefore, the random structure here was only location > year. We explain this now more clearly in lines 329 and 355-357.

      Line 334: I can see why the authors used these distributions but it is presented here without any justification. As a side note: Gamma (with log link) would likely be better for the Shannon model as well (I guess it cannot be 0 or negative ...).

      We explain this now better in lines 360-364.

      Line 341: Why Hellinger and not simply proportions?

      We used Hellinger transformation to give more weight to rarer species. Our pitfall traps were often dominated by large numbers of a few very abundant / active species. If we had used proportions, these species would have dominated the community analyses. We clarified this in the text (line 379-381).

      Line 348: An RDA is constrained by the assumptions / model the authors proposed and "forces" the data into a spatial ordination that resembles this model best. As the authors previously used an unconstrained PERMANOVA, it would be better to also use an NMDS that goes along with the PERMANOVA.

      The initial goal of the RDA was not to directly visualize the results of the PERMANOVA, but to show whether an overall crop configuration effect occurred, both for the whole dataset and per location. We have now added NMDS figures to link them to the PERMANOVA and added these to the supplementary figures (fig S6-S8). We also mention this approach in the methods section (line 387-390).

      Line 355f: This is also a clear indication of the strong annual fluctuations in carabid assemblages as mentioned above.

      Indeed.

      Line 361: 'pairwise'.

      Typo, we changed this.

      Line 362: reference missing.

      Reference added (line 405)

      References

      Alarcón-Segura, V., Grass, I., Breustedt, G., Rohlfs, M., Tscharntke, T., 2022. Strip intercropping of wheat and oilseed rape enhances biodiversity and biological pest control in a conventionally managed farm scenario. J. Appl. Ecol. 59, 1513-1523.

      Boetzl, F.A., Sponsler, D., Albrecht, M., Batáry, P., Birkhofer, K., Knapp, M., Krauss, J., Maas, B., Martin, E.A., Sirami, C., Sutter, L., Bertrand, C., Baillod, A.B., Bota, G., Bretagnolle, V., Brotons, L., Frank, T., Fusser, M., Giralt, D., González, E., Hof, A.R., Luka, H., Marrec, R., Nash, M.A., Ng, K., Plantegenest, M., Poulin, B., Siriwardena, G.M., Tscharntke, T., Tschumi, M., Vialatte, A., Van Vooren, L., Zubair-Anjum, M., Entling, M.H., Steffan-Dewenter, I., Schirmel, J., 2024. Distance functions of carabids in crop fields depend on functional traits, crop type and adjacent habitat: a synthesis. Proceedings of the Royal Society B: Biological Sciences 291, 20232383.

      Hallmann, C.A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H., Stenmans, W., Müller, A., Sumser, H., Hörren, T., Goulson, D., de Kroon, H., 2017. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS One 12, e0185809.

      Knapp, M., Seidl, M., Knappová, J., Macek, M., Saska, P., 2019. Temporal changes in the spatial distribution of carabid beetles around arable field-woodlot boundaries. Scientific Reports 9, 8967.

      Müller, J., Hothorn, T., Yuan, Y., Seibold, S., Mitesser, O., Rothacher, J., Freund, J., Wild, C., Wolz, M., Menzel, A., 2023. Weather explains the decline and rise of insect biomass over 34 years. Nature.

      Toivonen, M., Huusela, E., Hyvönen, T., Marjamäki, P., Järvinen, A., Kuussaari, M., 2022. Effects of crop type and production method on arable biodiversity in boreal farmland. Agriculture, Ecosystems & Environment 337, 108061.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we nuanced the text in the revised version to reflect this. See below for more details on how we revised the manuscript to reflect this point.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on the same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission (line 161-164).

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We revised the methods section to better explain this (line 318-322). Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. This approach allowed us to make a better estimation whenever more samples were available, as we did not always have an equal number of samples available between both cropping systems. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture fields, and we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive to changes in common species. Therefore, we have replaced the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles in the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species (line 135-143, 182-200 in discussion). Furthermore, we have added information on rarity and habitat preference to the table that shows species abundances per location (Table S2), and mention these aspects briefly in the results (line 145-153).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion (line 160-162).

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We revised the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and clarified this research direction clearer in the introduction of the revised manuscript (line 66-72).

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we further explored the species/genera that respond to cropping systems and discuss these findings in more detail in the revised manuscript (line 182-200 in discussion).

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we added a column for rarity (according to waarneming.nl) in the table showing abundances of species per location (table S2). We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We discuss this more in depth in the revised results (line 145-153).

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We nuanced the implications of our study in the revised manuscript (line 30-35, 232-237).

      Reviewer #3 (Recommendations for the authors):

      General comments:

      (1) I am missing the R script and data files in the manuscript. This is a serious drawback in assessing the quality of the work.

      Datasets and R scripts will be made available upon completion of the manuscript.

      (2) I have doubts about the clarity of the title. It more or less states that strip cropping is designed in order to maintain productivity. However, the main objective of strip cropping is to achieve ecological goals without losing productivity. I suggest a rethink of the title and what it is the authors want to convey.

      As the title lead to false expectations for multiple reviewers regarding analyses on yield, we chose to alter the title and removed any mention of yield in the title.

      (3) Line 22: I would add something along the lines of: "As an alternative to intercropping, strip cropping is pioneerd by Dutch farmers... " This makes the distinction and the connection between the two more clear.

      In our opinion, strip cropping is a form of intercropping. We have changed this sentence to reflect this point better. (line 21-22)

      (4) Line 24: "these" should read "they"

      After changing this sentence, this typo is no longer there (line 24).

      (5) Line 34-48. I think this introduction is too long. The paper is not directly about insect decline, so the authors could consider starting with line 43 and summarising 34-42 in one or two sentences.

      Removed a sentence on insect declines here to make the introduction more streamlined.

      (6) Line 51-59. I am not convinced the land sparing - land sharing idea adds anything to the paper. It is not used in the discussion and solicits much discussion in and of itself unnecessary in this paper. The point the authors want to make is not arable fields compared to natural biodiversity, but with increases in biodiversity in an already heavily degraded ecosystem; intensive agriculture. I think the introduction should focus on that narrative, instead of the land sparing-sharing dichotomy, especially because too little attention is spent on this narrative.

      We removed the section on land-sparing vs land-sharing as it was indeed off-topic.

      (7) Line 85. Dynamics is not correctly used here. It should read Ground beetle communities are sensitive.

      Changed accordingly (line 78-79).

      (8) Line 90-91. Here, it should be added that ground beetles are used as indicators for ground-dwelling insect diversity, not wider insect diversity in agricultural systems. In fact, Gerlach et al., the reference included, clearly warn against using indicator groups in a context that is too wide for a single indicator group to cover and Van Klink (2022) has recently shown in a meta-analysis that the correlation between trends in insect groups is often rather poor.

      We removed the sentence that claimed ground beetles to be indicators of general biodiversity, and have focused the text in general more on ground beetle biodiversity, rather than general biodiversity.

      (9) Line 178: was there a high weed abundance measured in the stripcropping fields? Or has there been reports on higher weed abundance in general? The references provided do not appear to support this claim.

      To our knowledge, there is only one paper on the effect of strip cropping on weeds (Ditzler et al., 2023). This paper shows strip cropping (and more diverse cropping systems) reduce weed cover, but increase weed richness and diversity. We mistakenly mentioned that crop diversification increases weed seed biomass, but have changed this accordingly to weed seed richness. The paper from Carbonne et al. (2022) indeed doesn’t show an effect of crop diversification on weeds. However, it does show a positive relation between weed seed richness and ground beetle activity density. We have moved this citation to the right place in the sentence (line 172-175).

      (10) Line 279-288. The description of sampling with pitfalls is inadequate. Please follow the guidelines for properly incorporating sufficient detail on pitfall sampling protocols as described in Brown & Matthews 2016,

      We were sadly not aware of this paper prior to the experiments, but have at least added information on all characteristics of the pitfall traps as mentioned in the paper (line 290-294).

      (11) Lines 307-310. What reasoning lies behind the choice to focus on the most beetle-rich monocultures? Do the authors have references for this way of comparing treatments? Is there much variation in the monocultures that solicits this approach? It would be preferable if the authors could elaborate on why this method is used, provide references that it is a generally accepted statistical technique and provide additional assesments of the variation in the data so it can be properly related to more familiar exploratory data analysis techniques.

      We ran two analyses for the field-level richness and abundance. First we used all combinations of monocultures and strip cropping. However, as strip cropping is made up of (at least) 2 crops, we had 2 constituent monocultures. As we would count a comparison with the same strip cropped field twice when we included both monocultures, we also chose to run the analyses again with only those monocultures that had the highest richness and abundance. This choice was done to get a conservative estimate of ground beetle richness increases through strip cropping. We explained this methodology further in the statistical analysis section (line 329-335).

      In Figure S6 the order of crop combinations is altered between 2021 on the left and 2022 on the right. This is not helpful to discover any possible patterns.

      We originally chose this order as it represented also the crop rotations, but it is indeed not helpful without that context. Therefore, we chose to change the order to have the same crop combinations within the rows.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Reviewer #2 (Public review):

      Summary

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.

      Strengths

      The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.

      Reviewer #3 (Public review):

      The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The edits have significantly improved the clarity of the manuscript. A few small notes:

      Figure 2B legend - describe what the orange dashed line represents

      We added a description.

      Figure 2B legend - references Table 1 but I believe this should reference Table S1. There are other places in the manuscript where Table 1 is referenced and it should reference S1

      We changed this for all instances in the main paper and supplement, where the reference was wrong.

      Figure S1 legend - some figure panel letters are in parentheses while others are not

      We unified the notation to not use parentheses for any of the panel letters.

      Reviewer #2 (Recommendations for the authors):

      I couldn't find the l, r, d, v indications in Fig. 1a. This was just a suggestion, but since you wrote you added them, I was wondering if this is the old figure version.

      We added them to what is now Fig. 2, which was originally part of Fig. 1. After restructuring, we did indeed not add an additional set to Fig. 1, which we have now adjusted.

      Fig. 2: Adding 'optic flow' and 'edges' to the y-axis in panels E and F, would make it faster for me to parse the figure. Maybe also add the units for the magnitudes? Same for Figure 6B

      We added 'optic flow' and 'edges' to the panels E and F in Fig. 2 and Fig. 6.

      Fig. 2: Very minor - could you use the same pictograms in D and E&F (i.e. all circles for example, instead of switching to "tunnels" in EF)?

      We used the tunnel pictograms, because we associated those with the short notations for the different conditions summarised in Table S1. Because we wanted to keep this consistent across the paper, we used the “tunnel” pictograms here too.

      In the manuscript, you still draw lots of conclusions based on these area measurements (L132-142, L204-209 etc). This does not fully reflect what you wrote in your reply to the reviewers. If you think of these measurements as qualitative rather than quantitative, I would say so in the manuscript and not use quantitative statistics etc. My suggestion would be to be more specific about potential issues that can influence the measurement (you mentioned body size, image contrast, motion blur, pitch across conditions etc) and give that data not the same weight as the rest of the measurements.

      We do express explicit caution with this measure in the methods section (l. 657-659) and the results section (l. 135-137). Nevertheless, as the trends in the data are consistent with optic flow responses in the other planes, and with responses reported in the literature, we felt that it is valuable to report the data, as well as the statistics for all readers, who can – given out cautionary statement – assess the data accordingly.

      The area measurements suggest that moths fly lower with unilateral vertical gratings (Fig. S1, G1 and G2 versus the rest). If you leave the data in can you speculate why that would be? (Sorry if I missed that)

      We agree, this seems quite consistent, but we do not have a good explanation for this observation. It would certainly require some additional experiments and variable conditions to understand what causes this phenomenon.

      Fig.4 - is panel B somehow flipped? Shouldn't the flight paths start out further away from the grating and then be moved closer to midline (as in A). That plot shows the opposite.

      Absolutely right, thank you for spotting this, it was indeed an intermediate and not the final figure which was uploaded to the manuscript. It also had outdated letter-number identifiers, which we now updated.

      L198 - should be "they avoided"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      We thank the reviewer for raising the concerns regarding the definition of ROI. Our approach to analyze V1 separately was based on two key considerations. First, previous studies consistently identify V1 as the main locus of sensory-like templates during featurespecific preparatory attention (Kok et al., 2014; Aitken et al., 2020). Second, V1 shows the strongest orientation selectivity within the visual hierarchy (Priebe, 2016). In contrast, the extrastriate visual cortex (EVC; comprising V2, V2, V3AB and V4) demonstrates broader selectivity, such as complex features like contour and texture (Grill-Spector & Malach, 2004). Thus, we think it would be particularly informative to analyze V1 data separately as our experiment examines orientation-based attention. We should also note that we conducted MVPA separately for each visual ROIs (V2, V3, V3AB and V4). After observing similar patterns of results across these regions, we averaged the decoding accuracies into a single value and labeled it as EVC. This approach allowed us to simplify data presentation while preserving the overall data pattern in decoding performance. We now added the related explanations on the ROI definition in the revised texts (Page 26; Line 576-581).

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      We thank the reviewer for the comments. We agree that including valid and neutral cue trials would have provided valuable behavioral measures of attention; Yet, our current design was aimed at maximizing the number of trials for decoding analysis due to fMRI time constraints. Thus, we could not fit additional conditions to measure the behavioral effects of attention. However, we note that in our previous studies using a similar feature cueing paradigm, we observed benefits of attentional cueing on behavioral performance when comparing valid and neutral conditions (Liu et al., 2007; Jigo et al., 2018). Furthermore, our neural data indeed demonstrated attention-related modulation (as indicated by MVPA results, Fig. 2 in the main texts) so we are confident that on average participants followed the instruction and deployed their attention accordingly. We now added the related explanations on this point in the revised texts (Page 23; Line 492-498).

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      We thank the reviewer for this comment. We added the following extensive discussion on this point in the revised texts (Page 18; Line 363-381).

      “It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance. While these functions are intuitively similar and likely overlap, there is also evidence indicating that they can be dissociated (Battistoni et al., 2017). In particular, we note that in our task, attention is guided by symbolic cues (color-orientation associations), while working memory tasks typically present the actual visual stimulus as the memorandum. A central finding in working memory studies is that neural signals during WM maintenance are sensory in nature, as demonstrated by generalizable neural activity patterns from stimulus encoding to maintenance in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019). However, in our task, neural signals during preparation were nonsensory, as demonstrated by a lack of such generalization in the No-Ping session (see also Gong et al., 2022). We believe that the differences in cue format and task demand in these studies may account for such differences. In addition to the difference in the sensory nature of the preparatory versus delay-period activity, our ping-related results also exhibited divergence from working memory studies (Wolff et al., 2017; 2020). While these studies used the visual impulse to differentiate active and latent representations of different items (e.g., attended vs. unattended memory item), our study demonstrated the active and latent representations of a single item in different formats (i.e., non-sensory vs. sensory-like). Moreover, unlike our study, the impulse did not evoke sensory-like neural patterns during memory retention (Wolff et al., 2017). These observations suggest that the cognitive and neural processes underlying preparatory attention and working memory maintenance could very well diverge. Future studies are necessary to delineate the relationship between these functions both at the behavioral and neural level.”

      (4) If I understand correctly, the only ROI that showed a significant difference for the crosstask generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      We thank the reviewer for this comment. We would like to clarify that our analyses revealed similar patterns of preparatory attentional representations in V1 and EVC. During the Ping session, the cross-task generalization analyses revealed decodable information in both V1 and EVC (ps < 0.001), significantly higher than that in the No-Ping session for V1 (independent t-test: t(38) = 3.145, p = 0.003; Cohen’s d = 0.995) and EVC (independent t-test: t(38) = 2.153, p = 0.038, Cohen’s d = 0.681) (Page 10; Line 194-196). While both areas maintained similar representations, additional measures (Mahalanobis distance, neural-behavior relationship and connectivity changes) showed more robust ping-evoked changes in V1 compared to EVC. This differential pattern likely reflects the primary role of V1 in orientation processing, with EVC showing a similar but weaker response profile. We have revised the text to clarity this point (Page 16; Line 327-329).

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      We thank the reviewer for this comment. The reviewer proposed an alternative account suggesting that visual pings may function to refocus attention, rather than reactivate latent information during the preparatory period. If this account holds (i.e., attention became weaker in the no-ping condition and it was strengthened by the ping due to re-focusing), we would expect to observe a general enhancement of attentional decoding during the preparatory period. However, our data reveal no significant differences in overall attention decoding between two conditions during this period (ps > 0.519; BF<sub>excl</sub> > 3.247), arguing against such a possibility.

      The reviewer also raised an interesting question about whether an auditory tone during preparation could produce effects similar to those observed with visual pings. Although our study did not directly test this possibility, existing literature provides some relevant evidence. In particular, prior studies have shown that latent visual working memory contents are selectively reactivated by visual impulses, but not by auditory stimuli (Wolff et al., 2020). This finding supports the modality-specificity for visually encoded contents, suggesting that sensory impulses must match the representational domain to effectively access latent visual information, which also argues against the refocusing hypothesis above. However, we do think that this is an important question that merits direct investigation in future studies. We now added the related discussion on this point in the revised texts (Page 10, Line 202-203; Page 19, Line 392395).

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      We thank the reviewer for this comment. We took the reviewer’s suggestion to explore the relationship between attentional modulation index (AMI) and RTs across participants for each session (see Figure 3). In the No-Ping session, we observed no significant correlation between AMI and RT (r = -0.366, p = 0.113). By contrast, the same analysis in the Ping condition revealed a significantly negative correlation (r = -0.518, p = 0.019). These results indicate that the attentional modulations evoked by visual impulse was associated with faster RTs, supporting the functional relevance of activating sensory-like representations during preparation. We have now included these inter-subject correlations in the main texts (Page 13, Line 258-264; Fig 3D and 3E) along with within-subject correlations in the Supplementary Information (Page 6, Line, 85-98; S3 Fig).

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      We thank the reviewer for this comment regarding the mechanistic basis of visual pings. We agree that this warrants deeper discussion. One possibility, as informed by theoretical studies of working memory, is that the sensory-like template could be maintained via an “activity-silent” mechanism through short-term changes in synaptic weights (Mongillo et al., 2008). In this framework, a visual impulse may function as nonspecific inputs that momentarily convert latent traces into detectable activity patterns (Rademaker & Serences, 2017). Related to our findings, it is unlikely that the orientation-specific templates observed during the Ping session emerged from purely non-sensory representations and were entirely induced by an exogenous ping, which was devoid of any orientation signal. Instead, the more parsimonious explanation is that visual impulse reactivated pre-existing latent sensory signals. To our knowledge, the detailed circuit-level mechanism of such reactivation is still unclear; existing evidence only suggests a relationship between ping-evoked inputs and the neural output (Wolff et al., 2017; Fan et al., 2021; Duncan et al., 2023). We now included the discussion on this point in the main texts (Page 19, Line 383-401).

      Reviewer #2 (Public review):

      (1) The origin of the latent sensory-like representation. By 'pinging' the neural activity with a high-contrast, task-irrelevant visual stimulus during the preparation period, the authors identified the representation of the attentional feature target that contains the same information as perceptual representations. The authors interpreted this finding as a 'sensory-like' template is inherently hosted in a latent form in the visual system, which is revealed by the pinging impulse. However, I am not sure whether such a sensory-like template is essentially created, rather than revealed, by the pinging impulses. First, unlike the classical employment of the pinging technique in working memory studies, the (latent) representation of the memoranda during the maintenance period is undisputed because participants could not have performed well in the subsequent memory test otherwise. However, this appears not to be the case in the present study. As shown in Figure 1C, there was no significant difference in behavioral performance between the ping and the no-ping sessions (see also lines 110-125, pg. 5-6). In other words, it seems to me that the subsequent attentional task performance does not necessarily rely on the generation of such sensory-like representations in the preparatory period and that the emergence of such sensory-like representations does not facilitate subsequent attentional performance either. In such a case, one might wonder whether such sensory-like templates are really created, hosted, and eventually utilized during the attentional process. Second, because the reference orientations (i.e. 45 degrees and 135 degrees) have remained unchanged throughout the experiment, it is highly possible that participants implicitly memorized these two orientations as they completed more and more trials. In such a case, one might wonder whether the 'sensory-like' templates are essentially latent working memory representations activated by the pinging as was reported in Wolff et al. (2017), rather than a functional signature of the attentional process.

      We thank the reviewer for this comment. We agree that the question of whether the sensory-like template is created or merely revealed by visual pinging is crucial for the understanding our findings. First, we acknowledge that our task may not be optimized for detecting changes in accuracy, as the task difficulty was controlled using individually adjusted thresholds (i.e., angular difference). Nevertheless, we observed some evidence supporting the neural-behavioral relationships. In particular, the impulse-driven sensory-like template in V1 contributed to facilitated faster RTs during stimulus selection (Page 12, Fig. 3D and 3E in the main texts; also see our response to R1, Point 6).

      Second, the reviewer raised an important concern about whether the attended feature might be stored in the memory system due to the trial-by-trial repetition of attention conditions (attend 45º or attend 135º). Although this is plausible, we don’t think it is likely. We note that neuroimaging evidence shows that attended working memory contents maintain sensory-like representations in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019), with generalizable neural activity patterns from perception to working memory delay-period, whereas unattended items in multi-item working memory tasks are stored in a latent state for prospective use (Wolff et al., 2017). Importantly, our task only required maintaining a single attentional template at a time. Thus, there was no need to store it via latent representations, if participants simply used a working memory mechanism for preparatory attention. Had they done so, we should expect to find evidence for a sensory template, i.e., generalizable neural pattern between perception and preparation in the No-Ping condition, which was not what we found. We have mentioned this point in the main texts (Page 18, Line 367-372).

      (2) The coexistence of the two types of attentional templates. The authors interpreted their findings as the outcome of a dual-format mechanism in which 'a non-sensory template' and a latent 'sensory-like' template coexist (e.g. lines 103-106, pg. 5). While I find this interpretation interesting and conceptually elegant, I am not sure whether it is appropriate to term it 'coexistence'. First, it is theoretically possible that there is only one representation in either session (i.e. a non-sensory template in the no-ping session and a sensory-like template in the ping session) in any of the brain regions considered. Second, it seems that there is no direct evidence concerning the temporal relationship between these two types of templates, provided that they commonly emerge in both sessions. Besides, due to the sluggish nature of fMRI data, it is difficult to tell whether the two types of templates temporally overlap.

      We thank the reviewer for the comment regarding our interpretation of the ‘coexistence’ of non-sensory and sensory-like attentional template. While we acknowledge the limitations of fMRI in resolving temporal relationships between these two types of templates, several aspects of our data support a dual-format interpretation.

      First, our key findings remained consistent for the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. It thus seems improbable that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the task-irrelevant, uninformative visual impulse. Second, while we agree with the reviewer that the temporal dynamics between these two templates remain unclear, it is difficult to imagine that orientation-specific templates observed during the Ping session emerged de novo from a purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? It seems to us that the more parsimonious explanation is that there was already some orientation signal in a latent format, and it was activated by the ping, in line with the models of “activity-silent” working memory. To address these concerns, we have added the related discussion of these alternative interpretations in the main texts (Page 19, Line 387-391)

      (3) The representational distance. The authors used Mahalanobis distance to quantify the similarity of neural representation between different conditions. According to the authors' hypothesis, one would expect greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session. However, this appears not to be the case. As shown in Figures 3B and C, there was no major difference in Mahalanobis distance between the two sessions in either ROI and the authors did not report a significant main effect of the session in any of the ANOVAs. Besides, in all the ANOVAs, the authors reported only the statistic term corresponding to the interaction effect without showing the descriptive statistics related to the interaction effect. It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective and intuitive understanding of their data.

      We thank the reviewer for this comment. We expected greater pattern similarity between 'attend leftward' and 'perceived leftward' in the Ping session in comparison to the Noping session. This prediction was supported by a significant three-way interaction effect between session × attended orientation × perceived orientation (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116). In particular, there was a significant interaction between attended orientation × perceived orientation (F(1,19) = 9.335, p = 0.007, η<sub>p</sub><sup>2</sup> = 0.329) in the Ping session, but not in the No-Ping session (F(1,19) = 0.017, p = 0.898, η<sub>p</sub><sup>2</sup> = 0.001). These above-mentioned statistical results were reported in the original texts. In addition, this three-way mixed ANOVA (session × attended orientation × perceived orientation) on Mahalanobis distance in V1 revealed no significant main effects (session: F(1,38) = 0.009, p = 0.923, η<sub>p</sub><sup>2</sup> < 0.001; attended orientation: F(1,38) = 0.116, p = 0.735, η<sub>p</sub><sup>2</sup> = 0.003; perceived orientation: (F(1,38) = 1.106, p = 0.300, η<sub>p</sub><sup>2</sup> = 0.028). We agree with the reviewer that a complete reporting of analyses enhances understanding of the data. Therefore, we have now included the main effects in the main texts (Page 11, Line 233).

      We thank the reviewer for the suggestion regarding the inclusion of descriptive statistics for interaction effects. However, since the data were already visualized in Fig. 3B and 3C in the main texts, to maintain conciseness and consistency with the reporting style of other analyses in the texts, we have opted to include these statistics in the Supplementary Information (Page 5, Table 1).

      Reviewer #3 (Public review):

      (1) The title is "Dual-format Attentional Template," yet the supporting evidence for the nonsensory format and its guiding function is quite weak. The author could consider conducting further generalization analysis from stimulus selection to preparation stages to explore whether additional information emerges.

      We thank the reviewer for this comment. Our approach to investigate whether preparatory attention is encoded in sensory or non-sensory format - by training classifier using separate runs of perception task – closely followed methods from previous studies (Stokes et al., 2009; Peelen et al., 2011; Kok et al., 2017). Following the reviewer’s suggestion, we performed generalization analyses by training classifiers on activity during the stimulus selection period and testing them preparatory activity. However, we observed no significant generalization effects in either No-Ping and Ping sessions (ps > 0.780). This null result may stem from a key difference in the neural representations: classifiers trained on neural activity from stimulus selection period necessarily encode both target and distractor information, thus relying on somewhat different information than classifier trained exclusively on isolated target information in the perception task.

      (2) In Figure 2, the author did not find any decodable sensory-like coding in IPS and PFC, even during the impulse-driven session, indicating that these regions do not represent sensory-like information. However, in the final section, the author claimed that the impulse-driven sensorylike template strengthens informational connectivity between sensory and frontoparietal areas. This raises a question: how can we reconcile the lack of decodable coding in these frontoparietal regions with the reported enhancement in network communication? It would be helpful if the author provided a clearer explanation or additional evidence to bridge this gap.

      We thank the reviewer for this comment. We would like to clarity that although we did not observe sensory-like coding during preparation in frontoparietal areas, we did observe attentional signals in these regions, as evidenced by the above-chance within-task attention decoding performance (Fig. 2 in the main texts). This could reflect different neural codes in different areas, and suggests that inter-regional communication does not necessarily require identical representational formats. It seems plausible that the representation of a non-sensory attentional template in frontoparietal areas supports top-down attentional control, consistent with theories suggesting increasing abstraction as the cortical hierarchy ascends (Badre, 2008; Brincat et al., 2018), and their interaction with the sensory representation in the visual areas is enhanced by the visual impulse.

      (3) Given that the impulse-driven sensory-like template facilitated behavior, the author proposed that it might also enhance network communication. Indeed, they observed changes in informational connectivity. However, it remains unclear whether these changes in network communication have a direct and robust relationship with behavioral improvements.

      We thank the reviewer for the suggestion. To examine how network communication relates to behavior, we performed a correlation analysis between information connectivity (IC) and RTs across participants (see Figure S5). We observed a trend of correlations between V1-PFC connectivity and RTs in the Ping session (r = -0.394, p = 0.086), but not in the NoPing session (r = -0.046, <i.p\</i> = 0.846). No significant correlations were found between V1-IPS and RTs (\ps\ > 0.400) or between ICs and accuracy (ps > 0.399). These results suggests that ping-enhanced connectivity might contributed to facilitated responses. Although we may not have sufficient statistical power to warrant a strong conclusion, we think this result is still highly suggestive, so we now added the texts in the Supplementary Information (Page 8, Line 116121; S5 Fig) and mentioned this result in the main texts (Page 14, Line 292-293).

      (4) I'm uncertain about the definition of the sensory-like template in this paper. Is it referring to the Ping impulse-driven condition or the decodable performance in the early visual cortex? If it is the former, even in working memory, whether pinging identifies an activity-silent mechanism is currently debated. If it's the latter, the authors should consider whether a causal relationship - such as "activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas" - is reasonable.

      We apologize for the confusions. The sensory-like template by itself does not directly refer to representations under Ping session or the attentional decoding in early visual cortex. Instead, it pertains to the representational format of attentional signals during preparation. Specifically, its existence is inferred from cross-task generalization, where neural patterns from a perception task (perceive 45º or perceive 135º) generalize to an attention task (attend 45 º or attend 135º). We think this is a reasonable and accepted operational definition of the representational format. Our findings suggest that the sensory-like template likely existed in a latent state and was reactivated by visual pings, aligning more closely with the first account raised by the reviewer.

      We agree with the reviewer that whether ping identifies an activity-silent mechanism is currently debated (Schneegans & Bays, 2017; Barbosa et al., 2021). It is possible that visual impulse amplified a subtle but active representation of the sensory template during attentional preparation and resulted in decodable performance in visual cortex. Distinguishing between these two accounts likely requires neurophysiological measurements, which are beyond the scope of the current study. We have explicitly addressed this limitation in our Discussion (Page 19, Line 395-399).

      Nevertheless, the latent sensory-like template account remains plausible for three reasons. First, our interpretation aligns with theoretical framework proposing that the brain maintains more veridical, detailed target templates than those typically utilized for guiding attention (Wolfe, 2021; Yu et al., 2023). Second, this explanation is consistent with the proposed utility of latent working memory for prospective use, as maintaining a latent sensory-like template during preparation would be useful for subsequent stimulus selection. The latter point was further supported by the reviewer’s suggestion about whether “activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas is reasonable”. Our additional analyses (also refer to our response to Reviewer 3, Point 3) suggested that impulse-enhanced V1-PFC connectivity was associated with a trend of faster behavioral responses (r = -0.394, p = 0.086; see Supplementary Information, Page 8, Line 116-121; S5 Fig). Considering these findings in totality, we think it is reasonable to suggest that visual impulse may strengthen information flow among areas to enhance attentional control.

      Recommendation for the Authors:

      Reviewer #1 (Recommendation for the authors):

      I hate to suggest another fMRI experiment, but in order to make strong claims about two states, I would want to see the methodological and interpretation confounds addressed. Ping condition - would a tone lead to the same result of sharpening the template? If so, then why? Can a ping be manipulated in its effectiveness? That would be an excellent manipulation condition.

      We thank the reviewer for the comments. Please refer to our reply to Reviewer 1, Point 5 for detailed explanation.

      Reviewer #2 (Recommendation for the authors):

      It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective understanding of their data.

      We thank the reviewer for the comments. We now included the relevant descriptive statistics in the Supplementary Information, Table 1.

      Reviewer #3 (Recommendation for the authors):

      In addition to p-values, I see many instances of 'ps'. Does this indicate the plural form of p?

      We used ‘ps’ to denote the minimal p-value across multiple statistical analyses, such as when applying identical tests to different region groups.

      References

      Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023.

      Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193-200.

      Barbosa, J., Lozano-Soldevilla, D., & Compte, A. (2021). Pinging the brain with visual impulses reveals electrically active, not activity-silent, working memories. PLoS Biology, 19(10), e3001436.

      Battistoni, E., Stein, T., & Peelen, M. V. (2017). Preparatory attention in visual cortex. Annals of the New York Academy of Sciences, 1396(1), 92-107.

      Brincat, S. L., Siegel, M., von Nicolai, C., & Miller, E. K. (2018). Gradual progression from sensory to task-related processing in cerebral cortex. Proceedings of the National Academy of Sciences, 115(30), E7202-E7211.

      Duncan, D. H., van Moorselaar, D., & Theeuwes, J. (2023). Pinging the brain to reveal the hidden attentional priority map using encephalography. Nature Communications, 14(1), 4749.

      Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuroscience, 27(1), 649-677.

      Gong, M., Chen, Y., & Liu, T. (2022). Preparatory attention to visual features primarily relies on nonsensory representation. Scientific Reports, 12(1), 21726.

      Fan, Y., Han, Q., Guo, S., & Luo, H. (2021). Distinct Neural Representations of Content and Ordinal Structure in Auditory Sequence Memory. Journal of Neuroscience, 41(29), 6290–6303.

      Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.

      Jigo, M., Gong, M., & Liu, T. (2018). Neural determinants of task performance during feature-based attention in human cortex. eNeuro, 5(1).

      Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evoke stimulus templates in the primary visual cortex. Journal of Cognitive Neuroscience, 26(7), 1546-1554.

      Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473-10478.

      Liu, T., Stevens, S. T., & Carrasco, M. (2007). Comparing the time course and efficacy of spatial and feature-based attention. Vision Research, 47(1), 108-113.

      Mongillo, G., Barak, O., & Tsodyks, M. (2008). Synaptic theory of working memory. Science, 319(5869), 1543-1546.

      Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125-12130. Priebe, N. J. (2016). Mechanisms of orientation selectivity in the primary visual cortex. Annual Review of Vision Science, 2(1), 85-107.

      Rademaker, R. L., & Serences, J. T. (2017). Pinging the brain to reveal hidden memories. Nature Neuroscience, 20(6), 767-769.

      Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336-1344.

      Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207-214.

      Schneegans, S., & Bays, P. M. (2017). Restoration of fMRI decodability does not imply latent working memory states. Journal of Cognitive Neuroscience, 29(12), 1977-1994.

      Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences, 106(46), 19569-19574.

      Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060-1092.

      Wolff, M. J., Jochim, J., Akyürek, E. G., & Stokes, M. G. (2017). Dynamic hidden states underlying working-memory-guided behavior. Nature Neuroscience, 20(6), 864 – 871.

      Wolff, M. J., Kandemir, G., Stokes, M. G., & Akyürek, E. G. (2020). Unimodal and bimodal access to sensory working memories by auditory and visual impulses. Journal of Neuroscience, 40(3), 671-681.

      Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E., & Geng, J. J. (2023). Good-enough attentional guidance. Trends in Cognitive Sciences, 27(4), 391-403.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, Dop1R1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that Dop1R1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest that different internal states (courtship failure vs. starvation) modulate the peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We have discussed these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      This is an important consideration. To rule out potential confound from food intake during courtship conditioning, we have now also conducted courtship conditioning in vials absent of food. In the absence of any feeding opportunity over the 4.5-hour courtship conditioning period, sexually rejected males still exhibited a robust decrease in sweet taste sensitivity compared with Naïve and Satisfied controls (Figure 1-supplement 1C). These data confirm that the suppression of PER is driven by courtship failure per se, rather than by differences in feeding during the conditioning phase.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Our initial description of the experimental setup might be a bit confusing. Here is a brief clarification of our experimental design and we have further clarified the details in the revised manuscript, which should resolve the reviewer’s concerns:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the dose-response curve). On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005).

      In the initial submission, we used 400 mM sucrose for the MAFE assay. When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding, as a natural consequence of decreased sugar sensitivity (Figure 1B). We were able to quantify the actual volume of food consumed of these flies showing PER responses towards 400 mM sucrose and observed no change (Figure 1-supplement 1A, left). To avoid potential confusion, we have now repeated the MAFE assay with 800 mM sucrose, which elicited feeding in ~100% of flies among all three groups, as shown in Figure 1C. Again, we observed no change in food intake (Figure 1-supplement 1A, right).

      These experiments in combination suggest that sexual failure suppresses sweet sensitivity of the Failed males. Meanwhile, as long as they still responded to a certain food stimulus and initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We have now quantified the effect of TH<sup>+</sup> neuron activation on Gr5a<sup>+</sup> neuron calcium responses. in Naïve males, dTRPA1-mediated activation of TH<sup>+</sup> cells significantly enhanced sucrose-induced calcium responses (Figure 3-supplement 1C); while in Failed males, the baseline activity of Gr5a<sup>+</sup> neurons was lower (Figure 3C), the same activation also produced significant (even slightly larger) effect on the calcium responses of Gr5a<sup>+</sup> neurons (Figure 3-supplement 1D).

      Taken together, we would argue that these experiments using both Naïve and Failed males were adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3-supplement 1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuitry.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We have added detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. For specificity, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. We have now addressed the issues by re-conducting these calcium imaging experiments again with a head-to-head comparison with the controls (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi).

      In these new experiments, Dop1R1 or Dop2R knockdown completely prevented the suppression of Gr5a<sup>+</sup> neuron responsiveness by courtship failure (Figure 6E), whereas the activities of Gr5a<sup>+</sup> neurons in Naïve/Satisfied groups were not altered. These results demonstrate that Dop1R1 and Dop2R are specifically required to mediate the decrease in sweet sensitivity following courtship failure.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We have changed the wording in the revised manuscript. In brief, we think that these manipulations have two consequences: suppressing the overall sweet sensitivity, and eliminating the effect of sexual failure on sweet sensitivity.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine the exact mechanism of how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds Figure 1-supplement 1D-E) upon courtship failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We have further discussed this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Please refer to our responses to the Public Review (Reviewer 1, Point 2) for details.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The label for the y-axis in Figure 1B should be "fraction", not "percentage".

      We have revised the figure as suggested.

      (2) I suggest that the authors indicate the ROIs they used to quantify the signal intensity in Figure 3E and G.

      We have revised the figures as suggested.

      (3) There is a typo in Figure 4A: it should be "Wilde type", not "Wide type".

      We have revised the figure as suggested.

      (4) The elav-GAL4/+ data in Figure 4-S1B, C, and D appears to be reused across these panels. However, the number of asterisks indicating significance in the MAT plots differs between them (three in panels B and C, and four in panel D). Is this a typo?

      It is indeed a typo, and we have revised the figure accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional comments:

      The authors should add this missing literature about dopamine and neuromodulation in courtship:

      Boehm et al., 2022 (eLife) - this study shows that mating affects olfactory behavior in females.

      Cazalé-Debat et al., 2024 (Nature) - Mating proximity blinds threat perception.

      Gautham et al., 2024 (Nature) - A dopamine-gated learning circuit underpins reproductive state-dependent odor preference in Drosophila females.

      We have added these references in the introduction section.

      Has the mating behavior been quantified? How often did males copulate with mated and virgin females?

      We tried to examine the copulation behavior based on our video recordings. In the “Failed” group (males paired with mated females), we observed virtually no successful copulation events at all, confirming that nearly 100% of those males experienced sexual failure. In contrast, males in the “Satisfied” group (paired with virgin females) mated on average 2-3 times during the 4.5-hour conditioning period. We have added some explanations in the manuscript.

      Do the rejected males live shorter? Is the effect also visible when they are fed with normal fly food, or is it only working with sugar?

      We did not directly measure the lifespan of these males. But we conducted a relevant assay (starvation resistance), in which “Failed” males died significantly faster than both Naïve and Satisfied controls, indicating a clear reduction in their ability to endure food deprivation (Figure 1-supplement 1B). Since sweet taste is a primary cue for food detection in Drosophila, and sugar makes up a large portion of their standard diet, the drop in sugar sensitivity we observed in Failed males could likewise impair their perception and consumption of regular fly food, hence their resistance to starvation.

      Also, the authors mention that the reward pathway is affected, this is probably the case as sugar sensation is impaired. One interesting experiment would be (and maybe has been done?) to test rejected males in normal odor-fructose conditioning. The data would suggest that they would do worse.

      We have already measured how courtship failure affected fructose sensitivity (Figure 1 supplement 1D), and we found that the reduction in fructose perception was even more profound than for sucrose. We have not yet tested whether Failed males showed deficits in odor-fructose associative conditioning. That was indeed a very interesting direction to explore. But olfactory reward learning relies on molecular and circuit mechanisms distinct from those governing taste. We therefore argue such experiments would be more suitable in a separate, follow up study.

      The authors could have added another group where males are exposed to other males. It would be interesting if this is also a "stressful" context and if it would also reduce sugar preference - probably beyond the scope of this paper.

      In our experiments, all flies, including those in the Naïve, Failed, and Satisfied groups, were housed in groups of 25 males per vial before the conditioning period (and the Naïve group remained in the same group housing until PER testing). This means every cohort experienced the same level of “social stress” from male-male interactions. While it would indeed be interesting to compare that to solitary housing or other male-only exposures, isolation itself imposes a different kind of stress, and disentangling these effects on sugar preference would require a separate, dedicated study beyond the scope of the present work.

      Would the behavior effect also show up with experienced males? Maybe this has been tested before. Does mating rejection in formerly successful males have the same impact?

      As suggested by the reviewer, we performed an additional experiment in which males that had previously mated successfully were subsequently subjected to courtship rejection. As shown in Figure 1 supplement 1F, prior successful mating did not prevent the decline in sweet sensitivity induced by subsequent mating failure, indicating that even experienced males exhibit the reduction in sugar sensitivity after rejection.

      Is the same circuit present and functioning in females? Does manipulating dopamine receptors in GR5a neurons in females lead to the same phenotype? This would suggest that different internal states in males and females could lead to the same phenotype and circuit modulations.

      This is indeed a very interesting suggestion. In male flies, Gr5a-specific knockdown of dopamine receptors did not alter baseline sweet sensitivity, but it selectively prevented the reduction in sugar perception that followed mating failure (Figure 6C-D), indicating that this dopaminergic pathway is engaged only in the context of courtship rejection. By extension, knocking down the same receptors in female GR5a neurons would likewise be expected to leave their basal sugar sensitivity unchanged. Moreover, because there is currently no established paradigm for inducing mating failure in female flies, we cannot yet test whether sexual rejection similarly modulates sweet taste in females, or whether it operates via the same circuit.

      Reviewer #3 (Recommendations for the authors):

      Suggestions to the authors:

      Introduction, line 61. I suggest the authors add references in fruit flies concerning the rewarding nature of mating. For example, the paper from Zhang et al, 2016 "Dopaminergic Circuitry Underlying Mating Drive" demonstrates the role of the dopamine rewarding system in mating drive. There is a large body of literature showing the link between dopamine and mating.

      We have added this literature in the introduction section.

      Figure 1B and Figure Supplement 1: If I understood correctly, Figure Supplement 1A shows that the total food consumption across all tested flies remains unchanged. However, fewer flies that failed to mate consumed sucrose. I would be curious to see the results for sucrose consumption per individual fly that did eat. According to their results, individual flies that failed to mate should consume more sucrose. This would change the conclusion. The authors currently show that a group of flies that failed to mate consumed less sucrose overall, but since fewer males actually ate, those that failed to mate and did eat consumed more sucrose. The authors should distinguish between failed and satisfied flies in two groups: those that ate and those that did not.

      Please see our responses to the Public Review for details (Reviewer 1, Point 2).

      Figure 1C, right: For a better understanding of all the "MAT" figures, I suggest the authors start the Y axis with the unit 25 and increase it to 400. This would match better the text (line 114) saying that it was significantly elevated in the failed group. As it is, we have the impression of a decrease in the graph.

      We have revised the figures accordingly.

      Line 103: When suggesting a reduced likelihood of meal initiation of these males, do these males take longer to eat when they did it? In other words, is the latency to eat increased in failed males? That would be a good measure of motivational state.

      We tried to analyze feeding latency in the MAFE assay by measuring the time from sucrose presentation to the first proboscis extension, but it was too short to be accurately accounted. Nevertheless, when conducting the experiments, we did not feel/observe any significant difference in the feeding latency between Failed males and Naïve or Satisfied controls.

      Line 117. I don't understand which results the authors refer to when writing "an overall elevation in the threshold to initiate feeding upon appetitive cues". Please specify.

      This phrase refers to the fact that for every sweet tastant we tested, including sucrose (Figure 1C), fructose and glucose (Figure 1 supplement 1D-E), the concentration-response curve in Failed males shifted to the right, and the Mean Acceptance Threshold (MAT) was significantly higher. In other words, for these different appetitive cues, mating failure raised the concentration of sugar required to trigger a proboscis extension, indicating a general elevation in the threshold to initiate feeding upon an appetitive cue.

      Figure 1D. Please specify the time for the satisfied group.

      For clarity, the Naïve and Satisfied groups in Figure 1D each represent pooled data from 0 to 72 hours post-treatment, as their sweet sensitivity remained stable throughout this period. Only the Failed group was shown with time-resolved data, since it was the only group exhibiting a dynamic change in sugar sensitivity over time. We have now specified this in the figure legend.

      Figure 1F. The phenotype was not totally reversed in failed-re-copulated males. Could it be due to the timing between failure and re-copulation? I suggest the authors mention in the figure or in the text, the time interval between failure and re-copulation.

      We’d like to clarify that the interval between the initial treatment (“Failed”) and the opportunity for re copulation was within 30 minutes. The incomplete reversal in the Failed-re-copulated group indeed raised interesting questions. One possible explanation is that mating failure reduces synaptic transmissions between the SEZ dopaminergic neurons and Gr5a<sup>+</sup> sweet sensory neurons (Figure 3), and the regeneration of these transmissions takes a longer time. We have added this information to the figure legend and the Method section.

      Line 227-228 and Figure 3E. The authors showed that the synaptic connections between dopaminergic neurons and Gr5a+ GRNs were significantly weakened. I am wondering about the delay between mating failure and the GFP observation. It would be informative to know this timing to interpret this decrease in synaptic connections. If the timing is relatively long, it is possible that we can observe a neuronal plasticity. However, if this timing is very short, I would not expect such synaptic plasticity.

      The interval between the behavioral treatment and the GRASP-GFP experiment was approximately 20 hours. We chose this time window because it was sufficient for both GFP expression and accumulation. Therefore, the observed reduction in synaptic connections between dopaminergic neurons and Gr5a<sup>+</sup> GRNs likely reflects a genuine, experience-induced structural and functional change rather than an immediate, transient effect. We have added this information to the revised manuscript for clarity in the Method section.

      Line 240-243: The authors demonstrated that there is a reduction of CaLexA-mediated GFP signals in dopaminergic neurons in the SEZ after mating failure, but not a reduction in Gr5a+ GRNs. I suggest replacing "indicate" with "suggest' in line 240.

      We have made the change accordingly. Meanwhile, we would like to clarify that while we observed a reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G), we did not directly test NFAT signal in Gr5a<sup>+</sup> neurons. Notably, the results that the synaptic transmissions from SEZ dopaminergic neurons to Gr5a<sup>+</sup> neurons were weakened (Figure 3E-F), and the reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G-I), were in line with a reduction in sweet sensitivity of Gr5a<sup>+</sup> neurons upon courtship failure (Figure 3B-D).

      Line 243: replace "consecutive" with "constitutive".

      We have revised it accordingly.

      Figure 5: I have trouble understanding the results obtained in Figure 5. Both constitutive activation and inhibition of Dop1R1 and Dop2R neurons lead to the same results, knowing that males who failed mating no longer exhibit decreased sweet sensitivity. I would have expected contrary results for both experimental conditions. I suggest the author to discuss their results.

      Both activation and inhibition of Dop1R1 and Dop2R neurons eliminated the effect of courtship failure on sweet sensitivity (Figure 5). These results are in line with our hypothesis that courtship failure leads to changes in dopamine signaling and hence sweet sensitivity. If dopamine signaling via Dop1R1 and Dop2R was locked, either to a silenced or a constitutively activated state, the effect of courtship failure on sweet sensitivity was eliminated.

      Nevertheless, as the reviewer pointed out, constitutive activation/inhibition should in principle lead to the opposite effect on Naïve flies. In fact, when Dop1R1<sup>+</sup>/Dop2R<sup>+</sup> neurons were silenced in Naïve flies, PER to sucrose was significantly reduced (Figure 5C-D), confirming that these neurons normally facilitate sweet sensation. Meanwhile, while neuronal activation by NaChBac did show a trend towards enhanced PER compared to the GAL4/+ controls, it did not exhibit a difference compared to +>UAS-NaChBac controls that showed a high PER level, likely due to a potential ceiling effect. We have added the discussions to the manuscript.

      Figure 7: I suggest the authors modify their figure a bit. It is not clear why in failed mating, the red arrow in "behavioral modulation" goes to the fly. The authors should find another way to show that mating failure decreased the percentage of flies that are motivated to eat sugar.

      We have modified the figure as suggested.

      Overall, I would suggest the authors be precautious with their conclusion. For example, line 337= "sexual failure suppressed feeding behavior". This is not what is shown by this study. Here, the study shows that mating failure decreases the fraction of flies to eat sucrose. Unless the authors demonstrate that this decrease is generalizable to other metabolites, I suggest the authors modify their conclusion.

      While we primarily used sucrose as the stimulant in our experiments, we also tested responses to two other sugars: fructose and glucose (Figure 1 supplement 1D-E). In all three cases, mating failure led to a significant reduction in sweet perception, suggesting that the effect of courtship failure is not limited to a single metabolite but rather reflects a general decrease in sweet sensitivity. Meanwhile, reduced sweet sensitivity indeed led to a reduction of feeding initiation (Figure 1).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      In the future, could you please include the exact changes made to the manuscript in the relevant section of the rebuttal, so it's clear which changes addressed the comment? That would make it easier to see what you refer to exactly - currently I have to guess which manuscript changes implement e.g. "We have tried to make these points more evident".

      Yes, we apologize for the inconvenience.

      On possible navigation solutions:

      I'm not sure if I follow this argument. If the networks uses a shifted allocentric representation centred on its initial state, it couldn't consistently decode the position from different starting positions within the same environment (I don't think egocentric is the right term here - egocentric generally refers to representations relative to the animal's own direction like "to the left" rather than "to the west" but these would not work in the allocentric decoding scheme here). In other words: If I path integrate my location relative to my starting location s1 in environment 1 and learn how to decode that representation to an environment location, I cannot use the same representation when I start from s2 in environment 1, because everything will have shifted. I still believe using boundaries is the only solution to infer the absolute location for the agent here (because that's the only information that it gets), and that's the reason for finding boundary representations (and not grid cells). Imagine doing this task on a perfect torus where there are no boundaries: it would be impossible to ever find out at what 'absolute' location you are in the environment. I have therefore not updated this part of my review, but do let me know if I misunderstood.

      Thank you for addressing this point, which is a somewhat unusual feature of our network: We believe the point you raise applies if the decoding were fixed. However, in our case, the decoding is dynamic and depends on the firing pattern, as place unit centers are decoded on a per-trajectory basis. Thus, a new place-like basis may be formed for each trajectory (and in each environment). Hence, the model is not constrained to reuse its representation across trajectories or environments, as place centers are inferred based on unit firing. However, we do observe that the network learns to use a fixed place field placement in each geometry, which likely reflects some optimal solution to the decoding problem. This might also help to explain the hexagonal arrangement of learned field centers. Finally, we agree that egocentric may not be entirely accurate, but we found it to be the best word to distinguish from the allocentric-type navigation adopted by the network.

      Regarding noise injection:

      Beyond that noise level, the network might return to high correlations, but that must be due to the boundary interactions - very much like what happens at the very beginning of entering an environment: the network has learned to use the boundary to figure out where it is from an uninformative initial hidden state. But I don't think this is currently reflected well in the main text. That still reads "Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor." I think your new (very useful!) velocity ablations show that only small noise is compensated for by attractor dynamics, and larger noise injections are error corrected through boundary interactions. I've added this to the new review.

      Thank you for your kind feedback: We have changed the phrasing in the text to say “robust even to moderate perturbations. ” As we hold that, while numerically small, the amount of injected noise is rather large when compared to the magnitude of activities in the network (see Fig. A5d); the largest maximal rate is around 0.1, which is similar to the noise level at which output representations fail to re-converge. However, some moderation is appropriate, we agree.

      On contexts being attractive:

      In the new bit of text, I'm not sure why "each environment appears to correspond to distinct attractive states (as evidenced by the global-type remapping behavior)", i.e. why global-type remapping is evidence for attractive states. Again, to me global-type remapping is evidence that contexts occupy different parts of activity space, but not that they are attractive. I like the new analysis in Appendix F, as it demonstrates that the context signal determines which region of activity space is selected (as opposed to the boundary information!). If I'm not mistaken, we know three things: 1. Different contexts exist in different parts of representation space, 2. Representations are attractive for small amounts of noise, 3. The context signal determines which point in representation space is selected (thanks to the new analysis in Appendix F). That seems to be in line with what the paper claims (I think "contexts are attractive" has been removed?) so I've updated the review.

      It seems to us that we are in agreement on this point; our aim is simply to point out that a particular context signal appears to correspond to a particular (discrete) attractor state (i.e., occupying a distinct part of representation space, as you state), it just seems we use slightly different language, but to avoid confusion, we changed this to say that “representations are attractive”.

      Thanks again for engaging with us, this discussion has been very helpful in improving the paper.

      Reviewer #2:

      However, I still struggle to understand the entire picture of the boundary-to-place-to-grid model. After all, what is the role of grid cells in the proposed view? Are they just redundant representations of the space? I encourage the authors to clarify these points in the last two paragraphs on pages 17-18 of the discussion.

      Thank you for your feedback. While we have discussed the possible role of a grid code to some extent, we agree that this point requires clarification. We have therefore added to the discussion on the role of grid cells, which now reads “While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry \cite{xu_conformal_2022, schaeffer_self-supervised_2023, schoyen_hexagons_2024}, capacity, spatial separation and path invariance \cite{schaeffer_self-supervised_2023}. Another possibility is that grid cells are geared more towards other cognitive tasks, such as providing a neural metric for space \cite{ginosar_are_2023, pettersen_self-supervised_2024}, or supporting memory and inference-making \cite{whittington_tolman-eichenbaum_2020}. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.”

      Thank you again for your time and input.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a nonhymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      We agree that a functional proof of the PsimOR14 function using reverse genetics would be a valuable addition to the study to firmly establish its role in trail pheromone sensing. Nevertheless, such a functional proof is difficult to obtain. Due to the very slow ontogenetic development inherent to termites (several months from an egg to the worker stage) the CRISPR Cas-9 is not a useful technique for this taxon. By contrast, termites are quite responsive to RNAimediated silencing and RNAi has previously been used for the silencing of the ORCo co-receptor in termites resulting in impairment of the trail-following behavior (DOI: 10.1093/jee/toaa248). Likewise, our previous experiments showed a decreased ORCo transcript abundance, lower sensitivity to neocembrene and reduced neocembrene trail following upon dsPsimORCo administration to P. simplex workers, while we did not succeed in reducing the transcript abundance of PsimOR14 upon dsPsimOR14 injection. We do not report these negative results in the present manuscript so as not to dilute the main message. In parallel, we are currently developing an alternative way of dsRNA delivery using nanoparticle coating, which may improve the RNAi experiments with ORs in termites.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      We thank the reviewer for the overall positive evaluation of the manuscript.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

      We address point by point the specific comments listed in the Recommendation to the authors chapter below.

      Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      We understand the comment about the lack of an intelligible cue to highlight the motivation and importance of the present study. In the current version of the manuscript the introduction has been reworked. As suggested by Reviewer 3 in the Recommendations section below, the introduction now integrates some parts of the original discussion, especially the part discussing the OR evolution and emergence of eusociality in hymenopteran social insects and in termites, while underscoring the need of data from termites to compare the commonalities and idiosyncrasies in neurophysiological (pre)adaptations potentially linked with the independent eusociality evolution in the two main social insect clades.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      Indeed, we were extremely lucky. Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. The selection criteria for the first set of four receptors were (i) to have full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) to be represented on different branches (subbranches) of the phylogenetic tree. Then it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component. In the revised version, we state these selection criteria in the results section (Phylogenetic reconstruction and candidate OR selection).

      The deorphanization attempts of additional P. simplex ORs are currently running.

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      We agree that the classification of multiporous sensilla into five categories lacks robust discrimination cues. The identification of the neocembrene-responding sensillum was initially carried out by SSR measurements on individual olfactory sensilla of P. simplex workers one-by-one and the topology of each tested sensillum was recorded on optical microscope photographs taken during the SSR experiment. Subsequently, the SEM and HR-SEM were performed in which we localized the neocembrene sensillum and tried to find distinguishing characters. We admit that these are not robust. Therefore, in the revised version of the manuscript we decided to abandon the attempt of sensilla classification and only report the observations about the specific sensillum in which we consistently recorded the response to neocembrene (and geranylgeraniol). The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.

      The reviewer points at an important methodological error we made while designing the experiments. Indeed, the inclusion of long-chain hydrocarbons into Panel 1 without additional heat applied to the odor cartridges was inappropriate, even though the experiments were performed at 25–26 °C. We carefully considered the best solution to correct the mistake and finally decided to remove all tested ligands beyond C22 from Panel 1, i.e. altogether five compounds. These changes did not affect the remaining Panels 2-4 (containing compounds with sufficient volatility), nor did they affect the message of the manuscript on highly selective response of PsimOR14 to neocembrene (and geranylgeryniol). In consequence, Figures 2, 3 and 5 were updated, along with the supplementary tables containing the raw data on SSR measurements. In addition, the tuning curve for PsimOR14 was re-built and receptor lifetime sparseness value re-calculated (without any important change). We also exchanged squalene for limonene in the docking and molecular dynamics analysis and made new calculations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) L 208: "than" instead of "that"

      Corrected.

      (2) L 527+527 strange squares (•) before dimensions

      Apparently an error upon file conversion, corrected.

      (3) L553 "reconstructing" instead of "reconstruct"

      Corrected.

      (4) Two references (Chahda et al. and Chang et al. appear too late in the alphabet.

      Corrected. Thank you for spotting this mistake. Due to our mistake the author list was ordered according to the alphabet in Czech language, which ranks CH after H.

      Reviewer #2 (Recommendations for the authors):

      (1) L148: Why did the authors select only four ORs (PsimOR9, 14, 30, and 31) though there are 50 ORs in P. simplex? I would like you to explain why you chose them.

      Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. Then, it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component, while the deorphanization attempts of a set of additional P. simplex ORs is currently running. In the revised version of the manuscript, we state the selection criteria for the four ORs studied in the Results section (Phylogenetic reconstruction and candidate OR selection).

      (2) L149: Where is Figure 1A? Does this mean Figure 1?

      Thank you for spotting this mistake. Fig. 1 is now properly labelled as Fig. 1A and 1B in the figure itself and in the legend. Also the text now either refers to either 1A or 1B.

      (3) Figure 1: The authors also showed the transcription abundance of all 50 ORs of P. simplex in the right bottom of Figure 1, but there is no explanation about it in the main text.

      The heatmap reporting the transcript abundances is now labelled as Fig. 1B and is referred to in the discussion section (in the original manuscript it was referred to on the same place as Fig. 1).

      (4) L260-265: The authors confirmed higher expression of PsimOR14 in workers than soldiers by using RNA-seq data and stronger EAG responses of PsimOR14 to neocembrene in workers than soldiers, but I think that confirming the expression levels of PsimOR14 in workers and soldiers by RT-qPCR would strengthen the authors' argument (it is optional).

      qPCR validation is a suitable complement to read count comparison of RNA Seq data, especially when the data comes from one-sample transcriptomes and/or low coverage sequencing. Yet, our RNA Seq analysis is based on sequencing of three independent biological replicates per phenotype (worker heads vs. soldier heads) with ~20 millions of reads per sample. Thus, the resulting differential gene expression analysis is a sufficient and powerful technique in terms of detection limit and dynamic range.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified since the Methods section only referred to the GenBank accession numbers in the original manuscript. Therefore, we added more information in the Methods section (Bioinformatics) and make clear in the Methods that this data comes from our previous research and related bioproject.

      (5) L491: I think that "The synthetic processes of these fatty alcohols are ..." is better.

      We replaced the sentence with “The de novo organic synthesis of these fatty alcohols is described …”

      (6) L525 and 527: There are white squares between the number and the unit. Perhaps some characters have been garbled.

      Apparently an error upon file conversion, corrected.

      (7) L795: ORCo?

      Corrected.

      (8) L829-830 & Figure 4: Where is Figure 4D?

      Thank you for spotting this mistake from the older version of Figure 4. The SSR traces referred to in the legend are in fact a part of Figure 5. Moreover, Figure 4 is now reworked based on the comments by Reviewer 3.

      (9) L860-864: Why did the authors select the result of edgeR for the volcano plot in Figure 7 although the authors use both DESeq2 and edgeR? An explanation would be needed.

      Both algorithms, DESeq2 and EdgeR, are routinely used for differential gene expression analysis. Since they differ in read count normalization method and statistical testing we decided to use both of them independently in order to reduce false positives. Because the resulting fold changes were practically identical in both algorithms (results for both analyses are listed in Supplementary table S15), we only reported in Fig. 7 the outputs for edgeR to avoid redundancies. We added in the Results section the information that both techniques listed PsimOR14 among the most upregulated in workers.

      Reviewer #3 (Recommendations for the authors):

      The discussion contains many descriptions that would fit better into the introduction, where they could be used to hint at the study's importance (e.g., 292-311, 381-412). The remaining parts often lack a detailed discussion of the results that integrates details from other insect studies. Although references were provided, no details were usually outlined. It would be helpful to see a stronger emphasis on what we learn from this study.

      Along with rewriting the introduction, we also modified the discussion. As suggested, the lines 292-311 were rewritten and placed in the introduction. By contrast, we preferred to keep the two paragraphs 381-412 in the discussion, since both of them outline the potential future interesting targets of research on termite ORs.

      As suggested, the discussion has been enriched and now includes comparative examples and relevant references about the broad/narrow selectivity of insect ORs, about the expected breadth of tuning of pheromone receptors vs. ORs detecting environmental cues, about the potential role of additional neurons housed in the neocembrene-detecting sensillum of P. simplex workers, etc. From both introduction and discussion the redundant details on the chemistry of termite communication have been removed.

      This includes explanations of the advantages of the specific methodologies the authors used and how they helped solve the manuscript's problem. What does the phylogeny solve? Was it used to select the ORs tested? It would be helpful to discuss what the phylogeny shows in comparison to other well-studied OR phylogenies, like those from the social Hymenoptera.

      We understand the comment. In fact, our motivation to include the phylogenetic tree of termite ORs was essentially to demonstrate (i) the orthologous nature of OR diversity with few expansions on low taxonomic levels, and (ii) to demonstrate graphically the relationship among the four selected sequences. We do not attempt here for a comprehensive phylogenetic analysis, because it would be redundant given that we recently published a large OR phylogeny which includes all sequences used in the present manuscript and analysed them in the proper context of related (cockroaches) and unrelated insect taxa (Johny et al., 2023). This paper also discusses the termite phylogenetic pattern with those observed in other Insecta. This paper is repeatedly cited on appropriate places of the present manuscript and its main observations are provided in the Introduction section. Therefore, we feel that thorough discussion on termite phylogeny would be redundant in the present paper.

      The authors categorized the sensilla types. Potential problems in the categorization aside, it would be helpful to know if it is expected that you have sensilla specialized in perceiving one specific pheromone. What is known about sensilla in other insects?

      We understand. In the discussion of the revised version, we develop more about the features typical/expected for a pheromone receptor and the sensillum housing this receptor together with two other olfactory sensory neurons, including examples from other insects.

      As the manuscript currently stands, specialist readers with their respective background knowledge would find this study very interesting. In contrast, the general reader would probably fail to appreciate the importance of the results.

      We hope that the re-organized and simplified introduction may now be more intelligible even for non-specialist readers.

      (1) L35: Should "workers" be replaced with "worker antennae"?

      Corrected.

      (2) L62: Should "conservativeness" be replaced by "conservation"?

      Replaced with “parsimony”.

      (3) L129: How and why did the authors choose four candidate ORs? I could not find any information about this in the manuscript. I wondered why they did not pick the more highly expressed PsimOr20 and 26 (Figure 7).

      As already replied above in the Weaknesses section, we selected for the first deorphanization attempts only a modest set of four ORs, while an additional set is currently being tested. We also explained above the inclusion criteria, i.e. (i) full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) presence on different branches (subbranches) of the OR phylogeny. For these reasons, we did not primarily consider the expression patterns of different ORs. As for Fig. 7, it shows differential expression between soldiers and workers, which was not the primary guideline either and the data was obtained only after having the ORs tested by SSR. Yet, even though we had data on P. simplex ORs expression (Fig. 1B), we did not presume that pheromone receptors should be among the most expressed ORs, given the richness of chemical cues detected by worker termites and unlike, e.g., male moths, where ORs for sex pheromones are intuitively highly expressed.

      The strategy of OR selection is specified in the results section of the revised manuscript under “Phylogenetic reconstruction and candidate OR selection”.

      (4) 198 to 200: SI, II, and III look very similar. Additional measurements rather than qualitative descriptions are required to consider them distinct sensilla. The bending of SIII could be an artifact of preparation. I do not see how the authors could distinguish between SI and SII under the optical microscope for recordings. A detailed explanation is required.

      As we responded above in “Weaknesses” chapter, we admit that the sensilla classification is not intelligible. Therefore, we decided in the revised version to abandon the classification of sensilla types and only focus on the observations made on the neocembreneresponding sensillum. To recognize the specific sensillum, we used its topology on the last antennal segment. Because termite antennae are not densely populated with sensilla, it is relatively easy to distinguish individual sensilla based on their topology on the antenna, both in optical microscope and SEM photographs. The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      (5) 208: "Than" instead of "that"

      Corrected.

      (6) 280: I suggest replacing "demand" with "capabilities"

      Corrected.

      (7) 312: Why "nevertheless? It sounds as if the authors suggest that there is evidence that ORs are not important for communication. This should be reworded.

      We removed “Nevertheless” from the beginning of the sentence.

      (8) 321 to 323: This sentence sounds as if something is missing. I suggest rewriting it.

      This sentence simply says that empty neuron Drosophila is a good tool for termite OR deorphanization and that termite ORs work well Drosophila ORCo. We reworded the sentence.

      (9) 323: I suggest starting a new paragraph.

      Corrected.

      (10) 421: How many colonies were used for each of the analyses?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (11) 430: Did the termites originate from one or multiple colonies and did the authors sample from the Florida and Cuba population?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (12) 501: How was the termite antenna fixated? The authors refer to the Drosophila methods, but given the large antennal differences between these species, more specific information would be helpful.

      Understood. We added the following information into the Methods section under “Electrophysiology”: “The grounding electrode was carefully inserted into the clypeus and the antenna was fixed on a microscope slide using a glass electrode. To avoid the antennal movement, the microscope slide was covered with double-sided tape and the three distal antennal segments were attached to the slide.”

      (13)509: I want to confirm that the authors indicate that the outlet of the glass tube with the airstream and odorant is 4 cm away from the Drosophila or termite antenna. The distance seems to be very large.

      Thank you for spotting this obvious mistake. The 4 cm distance applies for the distance between the opening for Pasteur pipette insertion into the delivery tube, the outlet itself is situated approx. 1 cm from the antenna. This information is now corrected.

      (14) 510/527: It looks like all odor panels were equally applied onto the filter paper despite the difference in solvent (hexane and paraffin oil). How was the solvent difference addressed?

      In our study we combine two types of odorant panels. First, we test on all four studied receptors a panel containing several compounds relevant for termite chemical communication including the C12 unsaturated alcohols, the diterpene neocembrene, the sesquiterpene (3R,6E)-nerolidol and other compounds. These compounds are stored in the laboratory as hexane solutions to prevent the oxidation/polymerization and it is not advisable to transfer them to another solvent. In the second step we used three additional panels of frequently occurring insect semiochemicals, which are stored as paraffin oil solutions, so as to address the breadth of PsimOR14 tuning. We are aware that the evaporation dynamics differ between the two solvents but we did not have any suitable option how to solve this problem. We believe that the use of the two solvents does not compromise the general message on the receptor specificity. For each panel, the corresponding solvent is used as a control. Similarly, the use of two different solvents for SSR can be encountered in other studies, e.g. 10.1016/j.celrep.2015.07.031.

      (15) 518: delta spikes/sec works for all tables except for the wild type in Table S5. I could not figure out how the authors get to delta spikes/sec in that table.

      Thank you for your sharp eye. Due to our mistake, the values of Δ spikes per second reported in Table S5 for W1118 were erroneously calculated using the formula for 0.5 sec stimulation instead of 1 sec. We corrected this mistake which does not impact the results interpretation in Table S5 and Fig. 2.

      522: Did the workers and soldiers originate from different colonies or different populations?

      We now clearly describe in the Material and Methods section the origin of termites for different experiments. EAG measurements were made using individuals (workers, soldiers) from one Cuban colony.

      (16) Figure 6C/D: I suggest matching colors between the two figures. For example, instead of using an orange circle in C and a green coloration of the intracellular flap in D, I recommend using blue, which is not used for something else. In addition, the binding pocket could be separated better from anything else in a different color.

      We agree that the color match for the intracellular flap was missing. This figure is now reworked and the colors should have a better match and the binding region is better delineated.

      (17) Figure 7/Table S15: It is unclear where the transcriptome data originate and what they are based on. Are these antennal transcriptomes or head transcriptomes? Do these data come from previous data sets or data generated in this study? Figure 7 refers to heads, Table S15 to workers and soldiers, and the methods only refer to antennal extractions. This should be clarified in the text, the figure, and the table.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified and that the information that the RNASeq originated from samples of heads+antennae of workers and soldiers should be provided at appropriate places. Therefore, we added more information on replicates and origin of the data in the Methods section (Bioinformatics) and make clear that this data comes from our previous research and refer to the corresponding bioproject. Likewise, the Figure 7 legend and Table S15 heading have been updated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1: Indirect Estimates of White Matter Connections: While dMRI is a valuable tool, it inherently provides indirect and inferred information about neural pathways. The accuracy and specificity of tractography can be influenced by various factors, including fiber crossing, partial volume effects, and algorithmic assumptions. A potential limitation in the accuracy of indirect estimates might affect the precision of spatial extent measurements, introducing uncertainty in the interpretation of cortico-thalamic connectivity patterns. Addressing the methodological limitations associated with indirect estimates and considering complementary approaches could strengthen the overall robustness of the findings.

      We appreciate the reviewer’s comment and agree tractography is an indirect estimate and subject to limitations. Regarding this manuscript, the key question is not whether the anatomical tracts are without false positives or negatives, and in fact we argue that this question is outside the scope of this manuscript and has been addressed in several previous studies (e.g. Thomas et al. 2015, Schilling et al., 2020, Grisot et al. 2021, and many others). Instead, the key question for this manuscript is whether the focality of termination patterns within the thalamus is systematically biased in a way that the observation of a hierarchy effect is artifactual. The many supplementary analyses in this manuscript do help address this question and increase our confidence that the indirect nature of tractography does not systematically bias the EDpc1 measure such that association areas only appear to have more diffuse connectivity patterns relative to sensorimotor areas.

      Comment 2: An over-arching theme of my review is that, each time I found myself wondering about a detail, a null, or a reference, I had only to read the next sentence or paragraph to find my concern handled in a clear and concise fashion. This is, in my opinion, the mark of work of the highest order. I congratulate the authors on their excellent work, which I believe will be impactful and well-received.

      I have no notes that I feel can help improve what is already an impeccable piece of work.

      We thank the reviewer for the kind comment.

      Reviewer #2:

      Comment 1: Structural thalamocortical connectivity was estimated from diffusion imaging data obtained from the HCP dataset. Consequently, the robustness and accuracy of the results depend on the suitability of this data for such a purpose. Conducting tractography on the cortical-thalamic system is recognized as a challenging endeavor for several reasons. First, diffusion directions lose their clearly defined principal orientations once they reach the deep thalamic nuclei, rendering the tracking of structures on the medial side, such as the medial dorsal (MD) and pulvinar nuclei difficult. Somewhat concerning is those are regions that authors found to show diffuse connectivity patterns. Second, the thalamic radiata diverge into several directions, and routes to the lateral surface often lack the clarity necessary for successful tracking. It is unclear if all cortical regions have similar levels of accuracy, and some of the lateral associative regions might have less accurate tracking, making them appear to be more diffuse, biasing the results.

      As mentioned in the weakness section, it is crucial to address the need for better validation or the inclusion of control analyses to ensure that the results are not systematically biased due to known issues, such as the difficulty in tracking the medial thalamus and the potential for higher false positives when tracking the lateral frontal cortex.

      We thank that reviewer for bringing up an important point. To determine if some areas of the thalamus were more difficult to track and, in turn, biased the EDpc1 measure we added an additional supplemental figure (S31). In this figure, shown below, we calculate the total SC of all ipsilateral cortical areas to each thalamic voxel. We show that, indeed, medial thalamic voxels have a lower total streamline count to ipsilateral cortex, and we see reduced total streamline counts to lateral thalamic areas and the very posterior end of the thalamus. We determined if some cortical areas preferentially projected to parts of the thalamus with lower ipsilateral total SC (i.e. by calculating the overlap between SC and total cortical SC for each thalamic voxel) and found only a weak relationship with our measure. Furthermore, we regressed each voxel’s mean ipsilateral cortical SC from streamline count matrix. We found that the EDpc1 measure didn’t significantly change after the regression.

      Additionally, we note that this analysis assumes that all thalamic voxels should have equal strength of connectivity (i.e., total SC) to the ipsilateral cortex and that such a measure is a proxy for “accuracy.” While both of these assumptions may not be entirely valid, this figure does demonstrate that potential reductions in tracking from the medial thalamus does not significantly affect the EDpc1 measure.

      Comment 2: While the methodology employed by the authors appears to be state-of-the-art, there exists uncertainty regarding its appropriateness for validation, given the well-documented issues of false positives and false negatives in probabilistic diffusion tractography, as discussed by Thomas et al. 2014 PNAS. Although replicating the results in both humans and non-human primates strengthens the study, a more compelling validation approach would involve demonstrating the method's ability to accurately trace known tracts from established tracing studies or, even better, employing phantom track data. Many of the control analyses the authors presented, such as track density, do not speak to accuracy.

      In addition to or response to Reviewer 1 Comment 1, we would like to add the following:

      We agree with the reviewer that tractography methods have known limitations. We would also like to point out that several studies have already performed the studies suggested by the reviewer. Many studies have compared tracts reconstructed from diffusion data using tractography methods to tracer-derived connections (eg. Thomas et al., 2014, as mentioned by the reviewer; Donahue et al., 2016, J Neurosci; Dauguet et al., 2007 NeuroImage; Gao et al., 2013 PloS One; van den Heuvel et al., 2015, Hum Brain Map; Azadbakht et al., 2015 Cereb Cortex; Ambrosen et al., 2020 NeuroIamge). Notably, studies comparing tractography and tracer-derived white matter tracts in the same animal (e.g. Grisot et al., 2021; Gao et al., 2013 PloS One) have demonstrated that tractography errors may be inflated in studies comparing tractography and tracer-derived connections in different animals.

      Additionally, others have employed phantoms to assess the validity of tractography methods (e.g. Drobnjak et al., 2021). For the purposes of this manuscript, phantom data would not be an adequate control because phantom data would likely not capture the biological complexities of tracking subcortical white matter tracts and identifying projections within subcortical grey matter.

      While a comparison of our tractography-derived ED measure to ED calculated on terminations from tracer studies within the thalamus from several somatomotor and associative regions in macaques would provide additional confidence for our results, such a control is certainly outside the scope of this study. Additionally, such a study would not provide a ground truth comparison for the human data. Even if this hypothetical experiment was performed, a negative finding would not refute our results, as any differences could be attributed to evolutionary differences. Unfortunately, there exists no ground truth to compare human white matter connectivity patterns to, which is why we stress-tested our results in as many ways as possible. These stress tests revealed that our main findings are very robust.

      Specifically, as the key validity question of our study was whether there was a confound that systematically biased the ED measure as to make the hierarchy effect artifactual, the control analyses we performed to determine if track density, cortical geometry, bundle integrity, etc in fact do speak the robustness of the results. Regarding the track density analyses we argue that these control analyses do speaks to accuracy. The reviewer mentioned above that some cortical areas may be biased because their anatomical tracts may be more difficult to reconstruct using tractography. The mean streamline count is meant to reflect the density of a fiber bundle, but corticothalamic tracts that are more difficult to track will, by nature, have fewer streamline counts. So, the mean streamline not only reflects the density of a fiber bundle but also how easily that tract is to reconstruct. Therefore, if it was the case that cortical areas with more difficult to reconstruct white matter tracts to the thalamus are also more diffuse, then we should observe a strong positive correlation between the ED measure and the mean streamline count, which we tested directly and found only a weak correlation (Fig. S11). This is true for tracking to the entire thalamus, and the additional supplemental Figure S31 shows that reduced tracking to specific parts of the thalamus (e.g. the medial portion) also does not strongly relate to the ED measure. So, tracts that are more difficult to reconstruct may also be more diffuse, but this seems to add only a little noise and does not account for the strong relationship between the ED measure and T1w/T2w and RSFCpc1 measures the reflect the cortical hierarchy.

      Comment 3: If tracking the medial thalamus is indeed less accurate, characterized by higher false positives and false negatives, it could potentially lead to increased variability among individual subjects. In cases where results are averaged across subjects, as the authors have apparently done, this could inadvertently contribute to the emergence of the "diffuse" motif, as described in the context of the associative cortex. This presents a critical issue that requires a more thorough control analysis and validation process to ensure that the main results are not artifacts resulting from limitations in tractography.

      Additionally, conducting a control analysis to demonstrate that individual variability in tracking endpoints within the thalamus, when averaged across subjects, does not artificially generate a more diffuse connectivity pattern, is essential.

      We thank the reviewer for bringing up this point, and the reviewer is correct that a simple group average of streamline counts across that thalamus could make some thalamic patterns appear more diffuse if those patterns vary slightly in location across people. The simplest way to address this concern is to show that diffuse patterns are present in individual subjects. Fig. 2 panels B, C, H, and I are all subject-level figures, which show that we can replicate the group level findings in Fig. 2 panels F, G. Specifically, Fig 2. Panels H and I show that the effect of association areas exhibiting more diffuse connectivity patterns within the thalamus relative to sensorimotor areas is generalizable across subjects.

      To the reviewer’s point, the other way that averaged streamline counts could make focal connections seem diffuse is by averaging within cortical areas (e.g. to test the possibility that association areas may have highly variability focal patterns, and when averaged within the cortical area it makes these focal patterns appear more diffuse). To test this, we show that we can replicate the hierarchy effect at the vertex level, by calculating the extent of connectivity patterns for every cortical vertex and correlated vertex-level EDpc1 values to vertex-level T1w/T2w and RSFC_pc1 values (Fig S20).

      Hopefully the data shown in Fig. 2 (replication at the individual level) and Fig. S20 (replication at the vertex level) ameliorate the reviewer’s concerns that averaging highly variable focal connectivity patterns within the thalamus (either across people or across vertices) does not artifactually produce diffuse thalamic connectivity patterns for associative cortical areas.

      Comment 4: Because the authors included data from all thresholds, it seems likely that false positive tracks were included in the results. The methodology described seems to unavoidably include anatomically implausible pathways in the spatial extent analyses.

      The thresholding approach taken in the manuscript aimed to control for inter-areal differences in anatomical connection strength that could confound the ED estimates. Here I am not quite clear why inter-areal differences in anatomical connection strength have to be controlled. A global threshold applied on all thalamic voxels might kill some connections that are weak but do exist. Those weak pathways are less likely to survive at high thresholds. In the meantime, the mean ED is weighted, with more conservative thresholds having higher weights. That being said, isn't it possible that more robust pathways might contribute more to the mean ED than weaker pathways?

      This is a good point from the reviewer, and we appreciate them bringing up these points about our thresholding rationale. We would like to clarify two points: why it was appropriate for our question to threshold thalamic voxels for each cortical area separately and why we iteratively thresholded thalamic voxels.

      Regarding thalamic connectivity differences between cortical areas: a global threshold would indeed exclude weak, but potentially true, connections. This was part of our rationale for thresholding thalamic voxels for each cortical area separately. Too conservative of a global threshold would exclude all thalamic voxels for some cortical areas and too liberal of a threshold would include many potentially false positive connections for other cortical areas. Our method of thresholding each cortical area’s thalamic voxels separately ensured that we were sampling thalamic voxels in an equitable manner across cortical areas. We updated the text to clarify this:

      Methods section, pg. 11, section Framework to quantify the extent of thalamic connectivity patterns via Euclidean distance (ED)

      “We used Euclidean distance (ED) to quantify the extent of each cortical area's thalamic connectivity patters. Probabilistic tractography data require thresholding before the ED calculation. To avoid the selection of an arbitrary threshold (Sotiropoulos et al., 2019, Zhang et al., 2022), we calculated ED for a range of thresholds (Figure 1a). Our thresholding framework uses a tractography-derived connectivity matrix as input. We iteratively excluded voxels with lower streamline counts for each cortical parcel such that the same number of voxels was included at each threshold. At each threshold, ED was calculated between the top x\% of thalamic voxels with the highest streamline counts. This produced a matrix of ED values (360 cortical parcels by 100 thresholds). This matrix was used as input into a PCA to derive a single loading for each cortical parcel. While alternative thresholding approaches have been proposed, this framework optimizes the examination of spatial patterns by proportionally thresholding the data, enabling equitable sampling of each cortical parcel's streamline counts within the thalamus.

      This approach controlled for inter-areal differences in anatomical connection strength that could confound the ED estimates. In contrast, a global threshold, which is applied to all cortical areas, may exclude all thalamic streamline counts for some cortical areas that are more difficult to reconstruct, thus making it impossible to calculate ED for that cortical area, as there are no surviving thalamic voxels from which to calculate ED. This would be especially problematic for white matter tracts are more difficult to reconstruct (e.g. the auditory radiation), and cortical areas connected to the thalamus by those white matter tracts would have a disproportionate number of thalamic voxels excluded when using a global threshold.”

      Regarding thalamic connectivity differences across the thalamus for a given cortical area, the thresholding method we use does include anatomically implausible connections in the ED calculation because we sample voxels iteratively, and as more and more thalamic voxels are included in the ED analysis the likelihood that they reflect spurious connections increases. This approach made the most sense to us, because there is no way to identify a threshold that only includes true positive connections. And since this method does not exist, we sampled all thresholds and leveraged the behavior of the ED metric across thresholds to quantify the spread of a connectivity pattern. As the reviewer points out, since the measure is effectively “weighted,” more “robust” or anatomically plausible pathways should contribute more to the EDpc1 rather than weaker pathways. This is exactly the balanced approach we aimed for: a measure that is driven by connections that have the highest likelihood of being a true positive but does not rely on an arbitrary threshold.

      We did also replicate our main findings after thresholding and binarizing the data for separate thresholds, which show that our main effect was strongest only when thalamic voxels with the highest streamline counts (which are assumed to have a lower chance of being false positives) are included in the ED calculation (Fig. S5). This more traditional method of thresholding also supported our results, and increases our overall confidence that associative cortical areas have more diffuse connectivity patterns within the thalamus relative to somatomotor areas.

      Comment 5: In the introduction, there is a bit of ambiguity that needs clarification. The overall goal of the study appears to be the examination of anatomical connectivity from the cortex to the thalamus, specifically whether a cortical region projects to a single thalamic subregion or multiple thalamic subregions. However, certain parts of the introduction also suggest an exploration of the concept of thalamic integration, which typically means a single thalamic region integrating input from multiple cortical regions (converging input). These two patterns, many cortical regions to one thalamic region versus one cortical region to many different thalamic regions, represent distinct and fundamentally different concepts that should be clarified in the manuscript.

      We thank the reviewer for pointing out this ambiguity and have edited the introduction to clarify this point:

      Our argument for a potential mechanism for integration is the following: because corticothalamic connectivity is topographically organized, if a cortical area has a more diffuse anatomical projection across the thalamus that means its connections overlap with more cortical areas. To the reviewer’s point, our argument is simply that one cortical area targeting multiple thalamic nuclei inherently suggests that such a cortical area has overlapping connectivity patterns with many other cortical areas in the same thalamic subregion. We have updated the introduction to clarify this further.

      Intro, pg 1.

      “Studies of cortical-thalamic connectivity date back to the early 19th century, yet we still lack a comprehensive understanding of how these connections are organized (see 13 and 14 for review). The traditional view of the thalamus is based on its histologically-defined nuclear structure (6). This view was originally supported by evidence that cortical areas project to individual thalamic nuclei, suggesting that the thalamus primarily relays information (15). However, several studies have demonstrated that cortical connectivity within the thalamus is topographically organized and follows a smooth gradient across the thalamus (16–21). Additionally, some cortical areas exhibit extensive connections within the thalamus, which target multiple thalamic nuclei (22? ). These extensive connections may enable information integration within the thalamus through overlapping termination patterns from different cortical areas, a key mechanism for higher-order associative thalamic computations (23– 25). However, our knowledge of how thalamic connectivity patterns vary across cortical areas, especially in humans, remains incomplete. Characterizing cortical variation in thalamic connectivity patterns may offer insights into the functional roles of distinct cortico-thalamic loops (6, 7).”

      Discussion, pg 9. Section: The spatial properties of thalamic connectivity pat- terns provide insight into the role of the thalamus in shaping brain-wide information flow.

      “In this study, we demonstrate that association cortical areas exhibit diffuse anatomical connections within the thalamus. This may enable these cortical areas to integrate information from distributed areas across the cortex, a critical mechanism supporting higher-order neural computations. Specifically, because thalamocortical connectivity is organized topographically, a cortical area that projects to a larger set of thalamic subregions has the potential to communicate with many other cortical areas. We observed that anterior cingulate cortical areas had some of the most diffuse thalamic connections. This observation aligns with findings from Phillips et al. that area 24 exhibited the most diffuse anatomical terminations across the mediodorsal nucleus of the thalamus relative to other prefrontal cortical area…”

      Reviewer 3:

      Comment 1: Potential weaknesses of the study are that it seems to largely integrate aspects of the thalamus that have been already described before. The differentiation between sensory and association systems across thalamic subregions is something that has been described before (see: Oldham and Ball, 2023; Zheng et al., 2023; Yang et al., 2020 Mueller, 2020; Behrens, 2003).

      It is true that previous studies have shown that corticothalamic systems vary between sensory and associative cortical areas. Furthermore, there is much evidence that indicates that the sensory-association hierarchy is a major principle of brain organization in general. However, how and why these circuits are different is still not fully known, both across the whole brain and in corticothalamic circuits specifically.

      Our study is the first to compare patterns of anatomical connectivity within the thalamus and determine if cortical areas vary in the extent of those patterns. So our main finding isn't that sensory and association cortical areas show differences in thalamic connectivity, it is that they specifically show differences in their pattern of connectivity within the thalamus. This provides a unique insight into how sensory and associative systems differ in their thalamic connectivity in primates.

      Additionally, we show evidence that provides some insight into why these differences may exist. Although we cannot provide causal evidence, our data suggest that differences in patterns of anatomical connectivity within the thalamus were related to how different cortical areas process information via the thalamus, which aligns with speculations from Phillips et al 2021.

      So our main finding isn't that sensory and association cortical areas show differences in thalamic connectivity, is it that they specifically show differences in their pattern of connectivity within the thalamus and these differences may help us understand how these cortical areas process information and, in turn, how they may support different types of computations, both of which are major goals in neuroscience. To better clarify this in the manuscript, we made the following changes:

      Discussion, Paragraph 1, pg 8:

      “This study contributes to the rich body of literature investigating the organization of cortico-thalamic systems in human and non-human primates. Prior research has shown that features of thalamocortical connectivity differ between sensory and association systems, and our work advances this understanding by demonstrating that these systems also differ in the pattern and spatial extent of their anatomical connections within the thalamus. Using dMRI-derived tractography across species, we show that these connectivity patterns vary systematically along the cortical hierarchy in both humans and macaques. These findings are critical for establishing the anatomical architecture of how information flows within distinct cortico-thalamic systems. Specifically, we identify reproducible tractography motifs that correspond to sensorimotor and association circuits, which were consistent across individuals and generalize across species. Collectively, this study offers convergent evidence that the spatial pattern of anatomical connections within the thalamus differs between sensory and association cortical areas, which may support distinct computations across cortico-thalamic systems.”

      Comment 2: (1) Why not formally test the association between humans and macaques by bringing the brains to the same space?

      We thank the reviewer for this query. We were primarily interested in using the macaque data as a validation of the human data, because it was acquired at a much higher resolution, there are no motion confounds, and it provides a bridge with the tract tracing literature in macaques. We are currently studying interspecies differences in patterns of thalamic connectivity, as well as extensions of our approach into structure-function coupling, and we believe these topics warrant their own paper.

      Comment 3: (2) Possibly flesh out the differences between this study and other studies with related approaches a bit further.

      We updated the discussion section to better clarify the differences in this study from previous research. See response to Reviewer 3 Comment 1 for text changes.

      Comment 4: (3) The current title entails 'cortical hierarchy' but would 'differentiation between sensory and association regions' not be more correct? Or at least a reflection on how cortical hierarchy can be perceived?

      We treat these phrases as synonymous terms. Our definition of cortical hierarchy is a smooth transition in features between sensory and motor areas to higher-order associative areas. The use of cortical hierarchy is meant to reflect that our measure continuously varies across the cortex. We updated the manuscript to make this clearer:

      Abstract, pg 1.

      “Additionally, we leveraged resting-state functional MRI, cortical myelin, and human neural gene expression data to test if the extent of anatomical connections within the thalamus varied along the cortical hierarchy, from sensory and motor to multimodal associative cortical areas.”

      Comment 5: (4) For the core-matrix map, there is a marked left-right differences and also there are only two donors in the right hemisphere, possibly note this as a limitation?

      We thank the reviewer for this observation. We updated Fig. S28 Panel D to show that the correspondence between EDpc1 and the Core-Matrix (CPc) cortical maps holds when the correlation was done for left and right cortex, separately.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to illuminate how legumes promote symbiosis with beneficial nitrogen-fixing bacteria while maintaining a general defensive posture towards the plethora of potentially pathogenic bacteria in their environment. Intriguingly, a protein involved in plant defence signalling, RIN4, is implicated as a type of 'gatekeeper' for symbiosis, connecting symbiosis signalling with defence signalling. Although questions remain about how exactly RIN4 enables symbiosis, the work opens an important door to new discoveries in this area.

      Strengths:

      The study uses a multidisciplinary, state-of-the-art approach to implicate RIN4 in soybean nodulation and symbiosis development. The results support the authors' conclusions.

      Weaknesses:

      No serious weaknesses, although the manuscript could be improved slightly from technical and communication standpoints.

      Reviewer #2 (Public Review):

      Summary:

      The study by Toth et al. investigates the role of RIN4, a key immune regulator, in the symbiotic nitrogen fixation process between soybean and rhizobium. The authors found that SymRK can interact with and phosphorylate GmRIN4. This phosphorylation occurs within a 15 amino acid motif that is highly conserved in Nfixation clades. Genetic studies indicate that GmRIN4a/b play a role in root nodule symbiosis. Based on their data, the authors suggest that RIN4 may function as a key regulator connecting symbiotic and immune signaling pathways.

      Overall, the conclusions of this paper are well supported by the data, although there are a few areas that need clarification.

      Strengths:

      This study provides important insights by demonstrating that RIN4, a key immune regulator, is also required for symbiotic nitrogen fixation.

      The findings suggest that GmRIN4a/b could mediate appropriate responses during infection, whether it is by friendly or hostile organisms.

      Weaknesses:

      The study did not explore the immune response in the rin4 mutant. Therefore, it remains unknown how GmRIN4a/b distinguishes between friend and foe.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Toth et al reveals a conserved phosphorylation site within the RIN4 (RPM1-interacting protein 4) R protein that is exclusive to two of the four nodulating clades, Fabales and Rosales. The authors present persuasive genetic and biochemical evidence that phosphorylation at the serine residue 143 of GmRIN4b, located within a 15-aa conserved motif with a core five amino acids 'GRDSP' region, by SymRK, is essential for optimal nodulation in soybean. While the experimental design and results are robust, the manuscript's discussion fails to clearly articulate the significance of these findings. Results described here are important to understand how the symbiosis signaling pathway prioritizes associations with beneficial rhizobia, while repressing immunity-related signals.

      Strengths:

      The manuscript asks an important question in plant-microbe interaction studies with interesting findings.

      Overall, the experiments are detailed, thorough, and very well-designed. The findings appear to be robust.

      The authors provide results that are not overinterpreted and are instead measured and logical.

      Weaknesses:

      No major weaknesses. However, a well-thought-out discussion integrating all the findings and interpreting them is lacking; in its current form, the discussion lacks 'boldness'. The primary question of the study - how plants differentiate between pathogens and symbionts - is not discussed in light of the findings. The concluding remark, "Taken together, our results indicate that successful development of the root nodule symbiosis requires cross-talk between NF-triggered symbiotic signaling and plant immune signaling mediated by RIN4," though accurate, fails to capture the novelty or significance of the findings, and left me wondering how this adds to what is already known. A clear conclusion, for eg, the phosphorylation of RIN4 isoforms by SYMRK at S143 modulates immune responses during symbiotic interactions with rhizobia, or similar, is needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticism of the work, although it could be improved by addressing the following minor points:

      (1) Page 8, Figure 2 legend. Consider changing "proper symbiosis formation" to "normal nodulation" or something that better reflects control of nodule development/number.

      We thank you for the suggestion, the legend was changed to “...required for normal nodule formation” (see Page 10, revised manuscript)

      (2) Page 9. Cut "newly" from the first sentence of paragraph 2, as S143 phosphorylation was identified previously.

      Thank you for the suggestion, we removed “newly” from the sentence.

      (3) Page 10, Figure 3. Panels B showing green-fluorescent nodules are unnecessary given the quantitative data presented in the accompanying panel A. This goes for similar supplemental figures later.

      We appreciate the comment; regarding Figure 3 (complementing rin4b mutant, we updated the figures according to the other reviewer’s comment) and Suppl Figure 6 (OE phenotype of phospho-mimic/negative mutants), we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See Page 10, revised manuscript)

      (4) Consider swapping Figure 3 for Supplemental Figure S7, which I think shows more clearly the importance of RIN4 phosphorylation in nodulation.

      We appreciate the comment and have swapped the figures according to the reviewer’s suggestion. Legend, figure description, and manuscript text have been updated accordingly. (See page 12 and 38, revised manuscript)

      (5) Page 10. Replace "it will be referred to S143..." with "we refer to S143 instead of ....".

      We replaced it according to the comment.

      (6) Page 11, delete "While" from "While no interactions could be observed...".

      We deleted it according to the suggestion.

      (7) Page 33, Fig S5. How many biological replicates were performed to produce the data presented in panel C and what do the error bar and asterisk indicate? Check that this information is provided in all figures that show errors and statistical significance.

      Thank you for the remark. The experiment was repeated three times, and this note was added to the figure description. All the other figure legends with error bar(s) were checked whether replicates are indicated accordingly.

      (8) Page 37, Fig S11, panel B. Are averages of data from the 2 biological and 3 technical replicates shown? Add error bars and tests of significant difference.

      Averages of a total of 6 replicates (from 2 biological replicates, each run in triplicates) are shown. We thank the reviewer for pointing out the missing error bars and statistical test, we have updated the figure accordingly.

      (9) Fig S12. Why are panels A, C, E, and G presented? The other panels seem to show the same data more clearly- showing the linear relationship between peak area ratio and protein concentration.

      We have taken the reviewer’s comment into consideration and revised the figure, removing the calibration curves and showing only four panels. The figure legend has been corrected accordingly. (Please see page 43, revised masnuscript). The original figure (unlike other revised figures) had to be deleted from the revised manuscript,as it caused technical issues when converting the document into pdf.

      Reviewer #2 (Recommendations For The Authors):

      Some small suggestions:

      (1) It's good to include a protein schematic for RIN4 in Figure 1.

      We appreciate the reviewer’s suggestion and we have drawn a protein schematic and added it to Figure 1. The figure legend was updated accordingly.

      (2) There appears to be incorrect labeling in Figure 2c; please double-check and make the necessary corrections.

      With respect, we do not understand the comment about incorrect labeling. Would the reviewer please help us out and give more explanation? In Figure 2C, RIN4a and RIN4b expression was checked in transgenic roots expressing either EV (empty vector) or different silencing constructs targeting RIN4a/b.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed the level of detail and precision in experimental design.

      A discussion point could be - What does it mean that nodule number but not fixation is affected? Is RIN4 only involved in the entry stage of infection but not in nodules during N-fixation?

      Current/Our data suggest that RIN4 does indeed appear to be involved in infection. This hypothesis is supported by the findings that RIN4a/b was found phosphorylated in root hairs but not in root (or it was not detected in the root). The interaction with the early signaling RLKs also suggests that RIN4 is likely involved in the early stage of symbiosis formation.

      How would the authors explain their observation "However, the motif is retained in non-nodulating Fabales (such as C. canadensis, N. schottii; SI Appendix, Figure S2) and Rosales species as well." What does this imply about the role in symbiosis that the authors propose?

      We appreciate the reviewer’s question. The motif seems to be retained, however, it might be not only the motif but also the protein structure that in case of nodulating plants might be different. We have not investigated the structure of RIN4, how it would look based on certain features/upon interaction with another protein and/or post-translational modification(s). Griesman et al, (2018) showed the absence of certain genes within Fabales in non-nodulating species, we can speculate that these absent genes can’t interact with RIN4 in those species, therefore the lack of downstream signaling could be possible (in spite of the retained motif in non-nodulating species). At this point, there is not enough data or knowledge to further speculate.

      qPCR analysis of symbiotic pathway genes showed that both NIN-dependent and NIN-independent branches of the symbiosis signaling pathway were negatively affected in the rin4b mutant. Please derive a conclusion from this.

      We appreciate the comment, it also prompted us to correct the following sentence; original: “Since NIN is responsible for induction of NF-YA and ERN1 transcription factors, their reduced expression in rin4b plants was not unexpected (Fig. 5). “As ERN1 expression is independent of NIN (Kawaharada et al, 2017). The following sentences were also deleted as it represented a repetition of a statement above these sentences: “Soybean NF-YA1 homolog responded significantly to rhizobial treatment in rin4b plants, whereas NF-YA3 induction did not show significant induction (Fig. 5).“

      We added the following conclusion/hypothesis: “Based on the results of the expression data presented above, it seems that both NIN-dependent and NINindependent branches of the symbiotic signaling pathways are affected in the rin4b mutant background. This indicates that the role of RIN4 protein in the symbiotic pathway can be placed upstream of CYCLOPS, as the CYCLOPS transcription activating complex is responsible (directly or indirectly) for the activation of all TFs tested in our expression analysis (Singh et al, 2014/47, 48).” (Please see Page 16, revised manuscript)

      The authors are highly encouraged to write a thoughtful discussion that would accompany the detailed experimental work performed in this manuscript.

      We appreciate the comment, and we did some work on the discussion part of the document. (Please see Pages 17-19, revised manuscript)

      Some minor suggestions for overall readability are below.

      What about immune signaling genes? Given that authors hypothesize that "Absence of AtRIN4 leads to increased PTI responses and, therefore, it might be that GmRIN4b absence also causes enhanced PTI which might have contributed to significantly fewer nodules." Could check marker immune signaling gene expression FLS2 and others.

      We appreciate the reviewer’s comment, and while we believe those are very interesting questions/suggestions, answering them is out of the scope of the current manuscript. Partially because it has been shown that several defenseresponsive genes that were described in leaf immune responses could not be confirmed to respond in a similar manner in root (Chuberre et al., 2018). It was also shown that plant immune responses are compartmentalized and specialized in roots (Chuberre et al., 2018). If we were looking at immune-responsive genes, the signal might be diluted because of its specialized and compartmentalized nature. Another reason why these questions cannot be answered as a part of the current manuscript is because finding a suitable immune responsive gene would require rigorous experiments (not only in root, but also in root hair (over a timecourse) which would be a ground work for a separate study (root hair isolation is not a trivial experiment, it requires at least 250-300 seedlings per treatment/per time-point).

      Regarding FLS2, it is known in Arabidopsis that its expression is tissue-specific within the root, and it seems that FLS2 expression is restricted to the root vasculature (Wyrsch et al, 2015). In our manuscript, we showed that RIN4a/b is highly expressed in root hairs, as well as RIN4 phosphorylation was detectable in root hair but not in the root; therefore, we do not see the reason to investigate FLS2 expression.

      "in our hands only ERN1a could be amplified. One possible explanation for this observation is that primers were designed based on Williams 82 reference genome, while our rin4b mutant was generated in the Bert cultivar background." Is the sequence between the two cultivars and the primers that bind to ERN1b in both cultivars so different? If not, this explanation is not very convincing.

      At the time of performing the experiment the genomic sequence of the Bert cultivar (used for generating rin4b edited lines) was not publicly available. In accordance with the reviewer’s comment, we removed the explanation, as it does not seem to be relevant. (See page 16, revised manuscript)

      The figures are clear and there is a logical flow. The images of fluorescing nodules in Figure 2,3 panels with nodules are not informative or unbiased .

      We appreciate the comment, as for Figure 3 (complementing rin4b mutant), we updated the figures according to the other reviewer’s comment and Suppl. Figure 6 (OE phenotype of phospho-mimic/negative mutants) we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See pages 10, 12 and 38, revised manuscript)

      What does the exercise in isolation of rin4 mutants in lotus tell us? Is it worth including?

      Isolation of the Ljrin4 mutant suggests that RIN4 carries such an importance that the mutant version of it is lethal for the plant (as in Arabidospis, where most of the evidence regarding the role of RIN4 has been described), and an additional piece of evidence that RIN4 is similarly crucial across most land plant species.

      Sentence ambiguous. "Co-expression of RIN4a and b with SymRKßΔMLD and NFR1α _resulted in YFP fluorescence detected by Confocal Laser Scanning Microscopy (SI Appendix, Figure S8) suggesting that RIN4a and b proteins closely associate with both RLKs." Were all 4 expressed together?

      Thank you for the remark. Not all 4 proteins were co-expressed together. We adjusted the sentence as follows: “Co-expression of RIN4a/ and b with SymRKßΔMLD as well as and NFR1α resulted in YFP fluorescence…” I hope it is phrased in a clearer way. (See page 13, revised manuscript)

      Minor spelling errors throughout.. Costume-made (custom made?)

      Thank you for noticing. According to the Cambridge online dictionary, it is written with a hyphen, therefore, we added a hyphen and corrected the manuscript accordingly.

      CRISPR-cas9 or CRISPR/Cas9? Keep it consistent throughout. CRISPR-cas9 is the latest consensus.

      We corrected it to “CRISPR-Cas9” throughout the manuscript.

      References are missing for several 'obvious statements' but please include them to reach a broader audience. For example the first 5 sentences of the introduction. Also, statements such as 'Root hairs are the primary entry point for rhizobial infection in most legumes.'.

      Thank you for the comment. To make it clearer, we also added reference #1, after the third sentence of the introduction, as well as we added an additional review as reference. This additional review was also cited as the source for the sentence “Root hairs are the primary…” (Please see page 2, revised manuscript)

      Can you provide a percent value? Silencing of RIN4a and RIN4b resulted in significantly reduced nodule numbers on soybean transgenic roots in comparison to transgenic roots carrying the empty vector control. Also, this wording suggests it was a double K.D. but from the images, it appears they were individually silenced.

      We appreciate the reviewer's comment. We observed a 50-70% reduction in the number of nodules. We adjusted the text according to the reviewer's remark. (See page 9, revised manuscript)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      This manuscript reports preliminary evidence of successful optogenetic activation of single retinal ganglion cells (RGCs) through the eye of a living monkey using adaptive optics (AO).

      Strengths

      The eventual goals of this line of research have enormous potential impact in that they will probe the perceptual impact of activating single RGCs. While I think more data should be included, the four examples shown look quite convincing. Weaknesses

      While this is undoubtedly a technical achievement and an important step along this group's stated goal to measure the perceptual consequences of single-RGC activations, the presentation lacks the rigor that I would expect from what is really a methods paper. In my view, it is perfectly reasonable to publish the details of a method before it has yielded any new biological insights, but in those publications, there is a higher burden to report the methodological details, full data sets, calibrations, and limitations of the method. There is considerable room for improvement in reporting those aspects. Specifically, more raw data should be shown for activations of neighboring RGCs to pinpoint the actual resolution of the technique, and more than two cells (one from each field of view) should be tested.

      We have expanded sections discussing both the methodology and limitations of this technique via a rewrite of the results and discussion section. The data used in the paper is available online via the link provided in the manuscript. We agree that a more detailed investigation of the strengths and limitations of the approach would have been a laudable goal. However, before returning to more detailed studies, we have shifted our effort to developing the monkey psychophysical performance we need to combine with the single cell stimulation approach described here. In addition, the optogenetic ChrimsonR used in this study is not the best choice for this experiment because of its poor sensitivity. We are currently exploring the use of ChRmine (as described in lines 93-97), which is roughly 2 orders of magnitude more sensitive. We have also been working on methods to improve probe stabilization to reduce tracking errors during eye movements. Once these improvements have been implemented, we will undertake the more detailed studies suggested here. Nonetheless, as a pragmatic matter, we submit that it is valuable to document proof-of-concept with this manuscript.

      Some information about the density of labeled RGCs in these animals would also be helpful to provide context for how many well-isolated target cells exist per animal.

      We agree. Getting reliable information about labeled cell density would be difficult without detailed histology of the retina, which we are reluctant to do because it would require sacrificing these precious and expensive monkeys from which we continue to get valuable information. We are actively exploring methods to reduce the cell density to make isolation easier including the use of the CAMKII promoter as well as the use of intracranial injections via AAV.retro that would allow calcium indicator expression in the peripheral retina where RGCs form a monolayer. It may be that the rarity of isolated RGCS will not be a fundamental limitation of the approach in the future.

      Reviewer #2 (Public Review):

      This proof-of-principle study lays important groundwork for future studies. Murphy et al. expressed ChrimsonR and GCaMP6s in retinal ganglion cells of a living macaque. They recorded calcium responses and stimulated individual cells, optically. Neurons targeted for stimulation were activated strongly whereas neighboring neurons were not.

      The ability to record from neuronal populations while simultaneously stimulating a subset in a controlled way is a high priority for systems neuroscience, and this has been particularly challenging in primates. This study marks an important milestone in the journey towards this goal.

      The ability to detect stimulation of single RGCs was presumably due to the smallness of the light spot and the sparsity of transduction. Can the authors comment on the importance of the latter factor for their results? Is it possible that the stimulation protocol activated neurons nearby the targeted neuron that did not express GCaMP? Is it possible that off-target neurons near the targeted neuron expressed GCaMP, and were activated, but too weakly to produce a detectable GCaMP signal? In general, simply knowing that off-target signals were undetectable is not enough; knowing something about the threshold for the detection of off-target signals under the conditions of this experiment is critical.

      We agree with these points. We cannot rule out the possibility that some nearby cells were activated but we could not detect this because they did not express GCaMP. We also do not know whether cells responded but our recording methods were not sufficiently sensitive to detect them. A related limitation is that we do not know of course what the relationship is between the threshold for detection with calcium imaging and what the psychophysical detection threshold would have been an awake behaving monkey. Nonetheless, the data show that we can produce a much larger response in the target cell than in nearby cells whose response we can measure, and we suggest that that is a valuable contribution even if we can’t argue that the isolation is absolute. We’ve acknowledged these important limitations in the revised manuscript in lines 66-77.

      Minor comments:

      Did the lights used to stimulate and record from the retina excite RGCs via the normal lightsensing pathway? Were any such responses recorded? What was their magnitude?

      The recording light does activate the normal light-sensing pathway to some extent, although it does not fall upon the RGC receptive fields directly. There was a 30 second adaptation period at the beginning of each trial to minimize the impact of this on the recording of optogeneticallymediated responses, as described in lines 222-224. The optogenetic probe does not appear to significantly excite the cone pathway, and we do not see the expected off-target excitations that would result from this.

      The data presented attest to a lack of crosstalk between targeted and neighboring cells. It is therefore surprising that lines 69-72 are dedicated to methods for "reducing the crosstalk problem". More information should be provided regarding the magnitude of this problem under the current protocol/instrumentation and the techniques that were used to circumvent it to obtain the data presented.

      The “crosstalk problem” referred to in this quote refers to crosstalk caused by targeting cells at higher eccentricities that are more densely packed, which are not represented in the data. The data presented is limited to the more isolated central RGCs.

      Optical crosstalk could be spatial or spectral. Laying out this distinction plainly could help the reader understand the issues quickly. The Methods indicate that cells were chosen on the basis that they were > 20 µm from their nearest (well-labeled) neighbor to mitigate optical crosstalk, but the following sentence is about spectral overlap.

      We have added a clearer explanation of what precisely we mean by crosstalk in lines 213-221.

      Figure 2 legend: "...even the nearby cell somas do not show significantly elevated response (p >> 0.05, unpaired t-test) than other cells at more distant locations." This sentence does not indicate how some cells were classified as "nearby" whereas others were classified as being "at more distant locations". Perhaps a linear regression would be more appropriate than an unpaired t-test here.

      The distinction here between “nearby” and “more distant” is 50 µm. We have clarified this in the figure caption. Performing a linear regression on cell response over distance shows a slight downward trend in two of the four cells shown here, but this trend does not reach the threshold of significance.

      Line 56: "These recordings were... acquired earlier in the session where no stimulus was present." More information should be provided regarding the conditions under which this baseline was obtained. I assume that the ChrimsonR-activating light was off and the 488 nmGCaMP excitation light was on, but this was not stated explicitly. Were any other lights on (e.g. room lights or cone-imaging lights)? If there was no spatial component to the baseline measurement, "where" should be "when".

      Your assumptions are correct. There was no spatial component to the baseline measurement, and these measurements are explained in more detail in lines 240-243.

      Please add a scalebar to Figure 1a to facilitate comparison with Figure 2.

      This has been done.

      Lines 165-173: Was the 488 nm light static or 10 Hz-modulated? The text indicates that GCaMP was excited with a 488 nm light and data were acquired using a scanning light ophthalmoscope, but line 198 says that "the 488 nm imaging light provides a static stimulus".

      The 488nm is effectively modulated at 25 Hz by the scanning action of the system. I believe the 10 Hz modulated you speak of is the closed-loop correction rate of the adaptive optics. The text has been updated in lines 217-219 to clarify this.

      A potential application of this technology is for the study of visually guided behavior in awake macaques. This is an exciting prospect. With that in mind, a useful contribution of this report would be a frank discussion of the hurdles that remain for such application (in addition to eye movements, which are already discussed).

      Lines 109-130 now offer an expanded discussion of this topic.

      Reviewer #3 (Public Review):

      This paper reports a considerable technical achievement: the optogenetic activation of single retinal ganglion cells in vivo in monkeys. As clearly specified in the paper, this is an important step towards causal tests of the role of specific ganglion cell types in visual perception. Yet this methodological advance is not described currently in sufficient detail to replicate or evaluate. The paper could be improved substantially by including additional methodological details. Some specific suggestions follow.

      The start of the results needs a paragraph or more to outline how you got to Figure 1. Figure 1 itself lacks scale bars, and it is unclear, for example, that the ganglion cells targeted are in the foveal slope.

      The results have been rewritten with additional explanation of methodology and the location of the RGCs has been clarified.

      The text mentions the potential difficulties targeting ganglion cells at larger eccentricities where the soma density increases. If this is something that you have tried it would be nice to include some of that data (whether or not selective activation was possible). Related to this point, it would be helpful to include a summary of the ganglion cell density in monkey retina.

      This is not something we tried, as we knew that the axial resolution allowed by the monkey’s eye would result in an axial PSF too large to only hit a single cell. The overall ganglion cell density is less relevant than the density of cells expressing ChrimsonR/GCaMP, which we only have limited info about without detailed histology.

      Related to the point in the previous paragraph - do you have any experiments in which you systematically moved the stimulation spot away from the target ganglion cell to directly test the dependence of stimulation on distance? This would be a valuable addition to the paper.

      We agree that this would have been a valuable addition to the paper, but we are reluctant to do them now. We are implementing an improved method to track the eye and a better optogenetic agent in an entirely new instrument, and we think that future experiments along these lines would be best done when those changes are completed.

      The activity in Figure 1 recovers from activation very slowly - much more slowly than the light response of these cells, and much more slowly than the activity elicited in most optogenetic studies. Can you quantify this time course and comment on why it might be so slow?

      We attribute the slow recovery to the calcium dynamics of the cell, and this slow recovery time is consistent with calcium responses seen in our lab elicited via the cone pathway. Similar time courses can be seen in Yin (2013) for RGCs excited via their cone inputs.

      Traces from non-targeted cells should be shown in Figure 1 along with those of targeted cells.

      We have added this as part of Figure 2.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1:

      The authors addressed my previous concerns successfully. However, some critiques are addressed only in the response letter but not in the text (major comment 3, minor point 2). It will be great if they mention these in some parts of their manuscript.

      Major 3: We now mention the effect of acs-2i on life span in the discussion, lines 475-480:

      “Interestingly, acs-2 knockdown abolished glp-1 longevity (data not shown), consistent with previous work showing that NHR-49, a transcription factor that drives acs-2 expression, is required for glp-1 longevity (Ratnappan et al., 2014). Thus, inhibiting fatty acid β-oxidation promotes MML-1 nuclear localization under hxk-1i but abolishes lifespan extension, potentially due to epistatic effects on other transcription factors or processes.”

      Minor 2: We now speculate on the differences concerning hxk-3 knockdown on MML-1 nuclear localization resulting from the low expression of hxk-3 in adults, lines 99-102:

      “Among the three C. elegans hexokinase genes, hxk-1 and hxk-2 more strongly affected MML 1 nuclear localization in two independent MML-1::GFP reporter strains (Figure 1B, Supplementary Figure 1A), while hxk-3 had just a small effect on MML-1 nuclear localization, probably due to its low expression in adult worms (Hutter & Suh, 2016).”

      Reviewer #2:

      The authors have adequately addressed my previous concerns in their revised manuscript. However, I have one remaining minor concern regarding the link between lipid metabolism and MML-1 regulation. As proposed by the authors, HXKs modulate MML-1 localization between LD/mito and the nucleus. They have provided evidence supporting the roles of hxk-2 and the PPP in this regulatory process. Nonetheless, the involvement of hxk-1 and fatty acid oxidation (FAO) within this proposed framework remains unclear. Although FAO is generally believed to affect LD size, the potential effects of hxk-1 and FAO on LD should be investigated within the current study to further substantiate their model.

      We thank the reviewer for this comment. We now examine how hxk-1 and acs-2 affect lipid droplet size. Interestingly, we found that knockdown of acs-2 and hxk-1 acs-2 double knockdown resulted in a mild but significant increase in LD size (Supplementary Figure 4I), supporting the notion that the two hexokinases regulate MML-1 via distinct mechanisms, reflected in the updated model (Figure 5E).

    1. Author response:

      This study builds on, extends, and experimentally validates results/models from our previous study. Our and others’ data implicated SMC5/6, PML nuclear bodies (PML NBs), and SUMOylation in the transcriptional repression of extrachromosomal circular DNA (ecDNA). Moreover, multiple viruses were found to express early genes that combat SMC5/6-based repression through targeted proteasomal degradation (e.g. Hepatitis B virus HBx and HIV-1 Vpr). Thus, our analysis of the roles of the foregoing in plasmid repression yields a coherent set of results for the field to build on.

      In our previous study we presented a model, but no supportive ecDNA silencing data, suggesting that distinct SMC5/6 subcomplexes, SIMC1-SLF2 and SLF1/2, separately control its transcriptional repression and DNA repair activities. In this study we experimentally validate that prediction using an ecDNA silencing assay and SMC5/6 localization analysis following DNA damage.

      Our study further reveals the unexpected dispensability of PML NBs in the silencing of simple plasmid DNA, a departure from current dogma. This raises important questions for the field to address in terms of the silencing mechanisms for different ecDNAs across different cell types. Despite the dispensability of SUMO-rich PML NBs, SUMOylation is required for ecDNA repression. Lastly, the SV40 LT antigen early gene product counteracts ecDNA silencing. These results used genetic epistasis arguments to implicate SUMO and LT in SMC5/6-based transcriptional silencing. We provide provisional responses for some of the reviewer’s general comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound.

      Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism.

      (3) There are some points about the presented data that need to be clarified.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree, this manuscript fits the Research Advance model, which is the format that this manuscript was submitted in.

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      Please see introductory paragraphs above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Although we have no further revisions on the manuscript, we would like to respond to the remaining comments from the reviewers as follows.

      Reviewer 1:

      The authors have addressed some concerns raised in the initial review but some remain. In particular it is still unclear what conclusions can be drawn about taskrelated activity from scans that are performed 30 minutes after the behavioral task. I continue to think that a reorganization/analysis data according to event type would be useful and easier to interpret across the two brain areas, but the authors did not choose to do this. Finally, switching the cue-response association, I am convinced, would help to strengthen this study.

      As for the task-related activity, the strategy for PET scan was explained in our response to the comment 2 from Reviewer 2. Briefly, rats receive intravenous administration of 18F-FDG solution before the start of the behavioral session. The 18FFDG uptake into the cells starts immediately and reaches the maximum level until 30 min, being kept at least for 1 h. A 30-min PET scan is executed 25 min after the session. Therefore, the brain activity reflects the metabolic state during task performance in rats.

      Regarding data presentation of the electrophysiological experiments, we described the subpopulations of event-related neurons showing notable neuronal activity patterns in the order of aDLS and pVLS, according to the procedure of explanations for the behavioral study

      For switching the cue-response association, we mentioned the difference in firing activity between HR and LL trials, suggesting that different combinations between the stimulus and response may affect the level of firing activity. As suggested by the reviewer, an examination of switching the cue-response association is useful to confirm our interpretation. We will address this issue in our future studies.

      Reviewer 2:

      The authors have made important revisions to the manuscript and it has improved in clarity. They also added several figures in the rebuttal letter to answer questions by the reviewers. I would ask that these figures are also made public as part of the authors' response or if not, included in the manuscript.

      We will present the figures publicly available as part of our response.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, van Paassen et al. have studied how CD8 T cell functionality and levels predict HIV DNA decline. The article touches on interesting facets of HIV DNA decay, but ultimately comes across as somewhat hastily done and not convincing due to the major issues.

      (1) The use of only 2 time points to make many claims about longitudinal dynamics is not convincing. For instance, the fact that raw data do not show decay in intact, but do for defective/total, suggests that the present data is underpowered. The authors speculate that rising intact levels could be due to patients who have reservoirs with many proviruses with survival advantages, but this is not the parsimonious explanation vs the data simply being noisy without sufficient longitudinal follow-up. n=12 is fine, or even reasonably good for HIV reservoir studies, but to mitigate these issues would likely require more time points measured per person.

      (1b) Relatedly, the timing of the first time point (6 months) could be causing a number of issues because this is in the ballpark for when the HIV DNA decay decelerates, as shown by many papers. This unfortunate study design means some of these participants may already have stabilized HIV DNA levels, so earlier measurements would help to observe early kinetics, but also later measurements would be critical to be confident about stability.

      We agree that in order to thoroughly investigate reservoir decay in acutely treated individuals, more participants and/or more time points measured per participant would increase the power of the study and potentially, in line with literature, show a significant decay in intact HIV DNA as well. By its design (1) the NOVA study allows for a detailed longitudinal follow-up of reservoir and immunity from start ART onwards. In the present analysis in the NOVA cohort, we decided to focus on the 24- and 156-week time points. We plan to include more individuals in our analysis in the future, so that we can better model the longitudinal dynamics of the HIV reservoir.

      The main goal of the present study, however, was not to investigate the decay or longitudinal dynamics of the viral reservoir, but to understand the relationship of the HIV-specific CD8 T-cell responses early on ART with the reservoir changes across the subsequent 2.5-year period on suppressive therapy. We will revise the manuscript in order to clarify this. Moreover, we agree with the reviewer that the early time point (24 weeks) is a time at which many virological and immunological processes are ongoing and the reservoir may not have stabilized yet for every participant. We will highlight this in the revised manuscript.

      (2) Statistical analysis is frequently not sufficient for the claims being made, such that overinterpretation of the data is problematic in many places.

      (2a) First, though plausible that cd8s influence reservoir decay, much more rigorous statistical analysis would be needed to assert this directionality; this is an association, which could just as well be inverted (reservoir disappearance drives CD8 T cell disappearance).

      The second point that was raised by reviewer 1 is the statistical analysis, which is referred to as “not sufficient for the claims being made”. Moreover, a more “rigorous statistical analysis would be needed”. At this stage, it is unclear from the reviewer's comments what specific type of additional statistical analysis is being requested. Correlation analyses, such as the one used in this study, are a well-established approach to investigate the relationship between the immune response and reservoir size. However, as we aim to perform the most rigorous analysis possible, for the revised submission we will adjust our analysis for putative confounders (e.g. age and antiretroviral regimen).

      We would also like to note that the association between the CD8 T-cell response at 24 weeks and the subsequent decline (the difference between 24 and 156 weeks) in the reservoir cannot be bi-directional (that can only be the case when both variables are measured at the same time point).

      (2b) Words like "strong" for correlations must be justified by correlation coefficients, and these heat maps indicate many comparisons were made, such that p-values must be corrected appropriately.

      For the revised submission, we will provide correlation coefficients to justify the wording, and will adjust the p-values for multiple comparisons.

      (3) There is not enough introduction and references to put this work in the context of a large/mature field. The impacts of CD8s in HIV acute infection and HIV reservoirs are both deep fields with a lot of complexity.

      Lastly, reviewer 1 referred to the introduction and asked for more references and a more focused viewpoint because the field is large and complex. We aim to revise the introduction/discussion based on the suggestions from the reviewer.

      Reviewer #2 (Public review):

      Summary:

      This study investigated the impact of early HIV specific CD8 T cell responses on the viral reservoir size after 24 weeks and 3 years of follow-up in individuals who started ART during acute infection. Viral reservoir quantification showed that total and defective HIV DNA, but not intact, declined significantly between 24 weeks and 3 years post-ART. The authors also showed that functional HIV-specific CD8⁺ T-cell responses persisted over three years and that early CD8⁺ T-cell proliferative capacity was linked to reservoir decline, supporting early immune intervention in the design of curative strategies.

      Strengths:

      The paper is well written, easy to read, and the findings are clearly presented. The study is novel as it demonstrates the effect of HIV specific CD8 T cell responses on different states of the HIV reservoir, that is HIV-DNA (intact and defective), the transcriptionally active and inducible reservoir. Although small, the study cohort was relevant and well-characterized as it included individuals who initiated ART during acute infection, 12 of whom were followed longitudinally for 3 years, providing unique insights into the beneficial effects of early treatment on both immune responses and the viral reservoir. The study uses advanced methodology. I enjoyed reading the paper.

      Weaknesses:

      All participants were male (acknowledged by the authors), potentially reducing the generalizability of the findings to broader populations. A control group receiving ART during chronic infection would have been an interesting comparison.

      We thank the reviewer for their appreciation of our study. The reviewer raises the point that it would be useful to compare our data to a control group. Unfortunately, these samples are not yet available, but our study protocol allows for a control group (chronic infection) to ensure we can include a control group in the future.

      (1) Dijkstra M, Prins H, Prins JM, Reiss P, Boucher C, Verbon A, et al. Cohort profile: the Netherlands Cohort Study on Acute HIV infection (NOVA), a prospective cohort study of people with acute or early HIV infection who immediately initiate HIV treatment. BMJ Open. 2021;11(11):e048582.

    1. Author response:

      We thank you and the reviewers very much for the insightful comments on our manuscript. We plan to revise the manuscript as follows:

      (A) As suggested by Reviewer 1, we will carefully read through the entire manuscript and try to improve its clarity. Regarding the comments and recommendations from Reviewer 2, we plan to address the first recommendation and the specific comments about the analysis of DNA methylation. We can currently not address the second recommendation because the person responsible for gathering the data works at a different university now. However, we keep this in mind for future projects.

      (B) Regarding the two main comments of Reviewer 2, we plan the following:

      (1) The authors group their methylation analysis by sequence context (CG, CHG, CHH). I feel this is insufficient, because CG methylation can appear in two distinct forms: gene body methylation (gbM), which is CG-only methylation within genes, and transposable element (TE) and TE-like methylation (teM), which typically involves all sequence contexts and generally affects TEs, but can also be found within genes. GbM and teM have distinct epigenetic dynamics, and it is hard to know how methylation patterns are changing during the experiment if gbM and teM are mixed. This can also have downstream consequences (see point below).

      We thank Reviewer 2 for this suggestion. We usually separate the three contexts because they are set by different enzymes and not because of the entire process or function. It would indeed be informative to group DMCs into gbM and teM but as there are many regions with overlaps between genes and transposons, this also adds some complexity. Given that there were very few DMCs, we wanted to keep it short and simple. Therefore, we wrote that 87.3% of the DMCs were close to or within genes and that 98.1% were close to and within genes or transposons. Together with the clear overrepresentation of the CG context, this indicates that most of the DMCs were related to gbM. We will update the paragraph and specifically refer to gbM to make this clear.

      (2) For GO analysis, the authors use all annotated genes as a control. However, most of the methylation differences they observe are likely gbM, and gbM genes are not representative of all genes. The authors' results might therefore be explained purely as a consequence of analyzing gbM genes, and not an enrichment of methylation changes in any particular GO group.

      This indeed a point worth considering. We will update the GO analysis and define the background as genes with cytosines that we tested for differences in methylation and which also exhibited overall at least 10% methylation (i.e., one cytosine per gene was sufficient). This will reduce the background gene set from 34'615 to 18'315 genes. A first analysis shows that results will change with respect to the post-translational protein modifications but will remain similar for epigenetic regulation and terms related to transport and growth processes. We will update the paragraph accordingly.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e., 13B onto 13A, or among each other, i.e., 13As onto other 13As, and/or onto leg motoneurons, i.e., 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories, with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to a few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly affect leg grooming. As well as activating or silencing subpopulations, i.e., 3 to 6 elements of the 13A and 13B groups, has marked effects on leg grooming, including frequency and joint positions, and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e., feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e., grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects the generation of the motor behavior, thereby exemplifying their important role in generating grooming.

      We thank the reviewer for their thoughtful and constructive evaluation of our work. We are encouraged by their recognition of the major contributions of our study, including the identification of multiple inhibitory circuit motifs and their contribution to organizing rhythmic leg grooming behavior. We also appreciate the reviewer’s comments highlighting our use of connectomics, targeted manipulations, and modeling to reveal how distinct subsets of inhibitory interneurons contribute to motor behavior.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow for differentiation between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so, open loop experiments, e.g., in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      We appreciate the reviewer’s point regarding the role of sensory feedback in our experimental design. We agree that reafferent (sensory) input from ongoing movements could contribute to the behavioral outcomes of our optogenetic manipulations. However, our aim was not to isolate central versus peripheral contributions, but rather to assess the role of 13A/B neurons within the intact, operational sensorimotor system during natural grooming behavior.

      These inhibitory neurons form recurrent loops, synapse onto motor neurons, and receive proprioceptive input—placing them in a position to both shape central motor output and process sensory feedback. As such, manipulating their activity engages both central control and sensory consequences.

      The finding that silencing 13A neurons in dusted flies disrupts rhythmic leg coordination highlights their role in organizing grooming movements. Prior studies (e.g., Ravbar et al., 2021) show that grooming rhythms persist when sensory input is reduced, indicating a central origin, while sensory feedback refines timing, coordination, and long-timescale stability. We concluded that rhythmicity arises centrally but is shaped and stabilized by mechanosensory or proprioceptive feedback. Our current results are consistent with this view and support a model in which inhibitory premotor neurons participate in a closed-loop control architecture that generates and tunes rhythmic output.

      While we agree that fully removing sensory feedback and parsing distinct roles for neurons that participate in multiple circuit motifs would be desirable, we do not see a plausible experimental path to accomplish this - we would welcome suggestions!

      We considered the method used by Mendes and Mann (eLife 2023) to assess sensory feedback to walking, 5-40-GAL4, DacRE-flp, UAS->stop>TNT + 13A/B-spGAL4 X UAS-csChrimson. This would require converting one targeting system to LexA and presents significant technical challenges. More importantly, we believe the core interpretation issue would remain: broadly silencing proprioceptors would produce pleiotropic effects and impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input.

      We will clarify in the revised manuscript that our behavioral experiments were performed in freely moving flies under closed-loop conditions. We thank the reviewer for highlighting these important considerations and will revise the manuscript to better communicate the scope and interpretation of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      We thank the reviewer for their thoughtful and encouraging evaluation of our work. We are especially grateful for their recognition of our detailed connectome analysis and its contribution to understanding the organization of premotor inhibitory circuits. We appreciate the reviewer’s comments highlighting the integration of connectomics with optogenetic perturbations to functionally interrogate the 13A and 13B circuits, as well as their recognition of our modeling approach as a valuable framework for linking circuit architecture to behavior.

      Weaknesses:

      (1) In Figure 4, while the authors report statistically significant shifts in both proximal inter-leg distance and movement frequency across conditions, the distributions largely overlap, and only in Panel K (13B silencing) is there a noticeable deviation from the expected 7-8 Hz grooming frequency. Could the authors clarify whether these changes truly reflect disruption of the grooming rhythm?

      We are re-analyzing the whole dataset in the light of the reviews (specifically, we are now applying LMM to these statistics). For the panels in question (H-J), there is indeed a large overlap between the frequency distributions, but the box plots show median and quartiles, which partially overlap. (In the current analysis, as it stands, differences in means were small yet significant.) However, there is a noticeable (not yet quantified) difference in variability between the frequencies (the experimental group being the more variable one). If the activations/deactivations of 13A/B circuits disrupt the rhythm, we would indeed expect the frequencies to become more variable. So, in the revised version we will quantify the differences in both the means and the variabilities, and establish whether either shows significance after applying the LMM.

      More importantly, all this data would make the most sense if it were performed in undusted flies (with controls) as is done in the next figure.

      In our assay conditions, undusted flies groom infrequently. We used undusted flies for some optogenetic activation experiments, where the neuron activation triggers behavior initiation, but we chose to analyze the effect of silencing inhibitory neurons in dusted flies because dust reliably activates mechanosensory neurons and elicits robust grooming behavior, enabling us to assess how manipulation of 13A/B neurons alters grooming rhythmicity and leg coordination.

      (2) In Figure 4-Figure Supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (3) For broader lines targeting six or more 13A neurons, the authors provide specific predictions about expected behavioral effects-e.g., that activation should bias the limb toward flexion and silencing should bias toward extension based on connectivity to motor neurons. Yet, when using the more restricted line labeling only two 13A neurons (Figure 4 - Figure Supplement 2), no such prediction is made. The authors report disrupted grooming but do not specify whether the disruption is expected to bias the movement toward flexion or extension, nor do they discuss the muscle target. This is a missed opportunity to apply the same level of mechanistic reasoning that was used for broader manipulations.

      While we know which two neurons are labeled based on confocal expression, assigning their exact identity in the EM datasets has been challenging. One of these neurons appears absent from our 13A reconstructions of the right T1 neuropil in FANC, although we did locate it in MANC. However, its annotation in MANC has undergone multiple revisions, making confident assignment difficult at this time. Since we can’t be sure which motor neurons and muscles are most directly connected, we did not want to predict this line’s effect on leg movements.

      (4) Regarding Figure 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing that the authors get the behavior! It would still be important for the authors to mention the optogenetics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also surprised - and intrigued - by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We tried several different activation paradigms: pulsed from 8Hz to 500Hz and with various on/off intervals. Because several of these different stimulation protocols resulted in grooming, and with different rhythmic frequencies, we think the phenotypes are a specific property of the neural circuits we have activated, rather than the kinetics of CsChrimson itself.

      We will include the data from other frequencies in a new Supplementary Figure, we will discuss the caveats CsChrimson’s slow off-kinetics present to precise temporal control of neural activity, and we will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Thank you!

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study, in its current form, makes an important but overclaimed contribution to the literature due to a mismatch between the claims in the paper and the data presented.

      Strengths:

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      (1) They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      (2) They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      (3) They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      We appreciate the reviewer’s thorough and constructive feedback on our work. We are encouraged by their recognition of the complementary approaches used in our study.

      Weaknesses:

      The manuscript aims to reveal an instructive, rhythm-generating role for premotor inhibition in coordinating the multi-joint leg synergies underlying grooming. It makes a valuable contribution, but currently, the main claims in the paper are not well-supported by the presented evidence.

      Major points

      (1) Starting with the title of this manuscript, "Inhibitory circuits generate rhythms for leg movements during Drosophila grooming", the authors raise the expectation that they will show that the 13A and 13B hemilineages produce rhythmic output that underlies grooming. This manuscript does not show that. For instance, to test how they drive the rhythmic leg movements that underlie grooming requires the authors to test whether these neurons produce the rhythmic output underlying behavior in the absence of rhythmic input. Because the optogenetic pulses used for stimulation were rhythmic, the authors cannot make this point, and the modelling uses a "black box" excitatory network, the output of which might be rhythmic (this is not shown). Therefore, the evidence (behavioral entrainment; perturbation effects; computational model) is all indirect, meaning that the paper's claim that "inhibitory circuits generate rhythms" rests on inferred sufficiency. A direct recording (e.g., calcium imaging or patch-clamp) from 13A/13B during grooming - outside the scope of the study - would be needed to show intrinsic rhythmogenesis. The conclusions drawn from the data should therefore be tempered. Moreover, the "black box" needs to be opened. What output does it produce? How exactly is it connected to the 13A-13B circuit?

      We will modify the title to better reflect our strongest conclusions: “Inhibitory circuits coordinate rhythmic leg movements during Drosophila grooming”

      Our optogenetic activation was delivered in a patterned (70 ms on/off) fashion that entrains rhythmic movements but does not rule out the possibility that the rhythm is imposed externally. In the manuscript, we state that we used pulsed light to mimic a flexion-extension cycle and note that this approach tests whether inhibition is sufficient to drive rhythmic leg movements when temporally patterned. While this does not prove that 13A/13B neurons are intrinsic rhythm generators, it does demonstrate that activating subsets of inhibitory neurons is sufficient to elicit alternating leg movements resembling natural grooming and walking.

      Our goal with the model was to demonstrate that it is possible to produce rhythmic outputs with this 13A/B circuit, based on the connectome. The “black box” is a small recurrent neural network (RNN) consisting of 40 neurons in its hidden layer. The inputs are the “dust” levels from the environment (the green pixels in Figure 6I), the “proprioceptive” inputs (“efference copy” from motor neurons), and the amount of dust accumulated on both legs. The outputs (all positive) connect to the 13A neurons, the 13B neurons, and to the motor neurons. We refer to it as the “black box” because we make no claims about the actual excitatory inputs to these circuits. Its function is to provide input, needed to run the network, that reflects the distribution of “dust” in the environment as well as the information about the position of the legs.

      The output of the “black box” component of the model might be rhythmic. In fact, in most instances of the model implementation this is indeed the case. However, as mentioned in the current version of the manuscript: “But the 13A circuitry can still produce rhythmic behavior even without those external sensory inputs (or when set to a constant value), although the legs become less coordinated.” Indeed, when we refine the model (with the evolutionary training) without the “black box” (using a constant input of 0.1) the behavior is still rhythmic and sustained. Therefore, the rhythmic activity and behavior can emerge from the premotor circuitry itself without a rhythmic input.

      The context in which the 13A and 13B hemilineages sit also needs to be explained. What do we know about the other inputs to the motorneurons studied? What excitatory circuits are there?

      We agree that there are many more excitatory and inhibitory, direct and indirect, connections to motor neurons that will also affect leg movements for grooming and walking. Our goal was to demonstrate what is possible from a constrained circuit of inhibitory neurons that we mapped in detail, and we hope to add additional components to better replicate the biological circuit as behavioral and biomechanical data is obtained by us and others. We will add this clarification of the limits of the scope to the Discussion.

      Furthermore, the introduction ignores many decades of work in other species on the role of inhibitory cell types in motor systems. There is some mention of this in the discussion, but even previous work in Drosophila larvae is not mentioned, nor crustacean STG, nor any other cell types previously studied. This manuscript makes a valuable contribution, but it is not the first to study inhibition in motor systems, and this should be made clear to the reader.

      We thank the reviewer for this important reminder and we will expand our discussion of the relevant history and context in our revision. Previous work on the contribution of inhibitory neurons to invertebrate motor control certainly influenced our research and we should acknowledge this better.

      (2) The experimental evidence is not always presented convincingly, at times lacking data, quantification, explanation, appropriate rationales, or sufficient interpretation.

      We are committed to improving the clarity, rationale, and completeness of our experimental descriptions. We will revisit the statistical tests applied throughout the manuscript and expand the Methods.

      (3) The statistics used are unlike any I remember having seen, essentially one big t-test followed by correction for multiple comparisons. I wonder whether this approach is optimal for these nested, high‐dimensional behavioral data. For instance, the authors do not report any formal test of normality. This might be an issue given the often skewed distributions of kinematic variables that are reported. Moreover, each fly contributes many video segments, and each segment results in multiple measurements. By treating every segment as an independent observation, the non‐independence of measurements within the same animal is ignored. I think a linear mixed‐effects model (LMM) or generalized linear mixed model (GLMM) might be more appropriate.

      We thank the reviewer for raising this important point regarding the statistical treatment of our segmented behavioral data. Our initial analysis used independent t-tests with Bonferroni correction across behavioral classes and features, which allowed us to identify broad effects. However, we acknowledge that this approach does not account for the nested structure of the data. To address this, we will re-analyze key comparisons using linear mixed-effects models (LMMs) as suggested by the reviewer. This approach will allow us to more appropriately model within-fly variability and test the robustness of our conclusions. We will update the manuscript based on the outcomes of these analyses.

      (4) The manuscript mentions that legs are used for walking as well as grooming. While this is welcome, the authors then do not discuss the implications of this in sufficient detail. For instance, how should we interpret that pulsed stimulation of a subset of 13A neurons produces grooming and walking behaviours? How does neural control of grooming interact with that of walking?

      We do not know how the inhibitory neurons we investigated will affect walking or how circuits for control of grooming and walking might compete. We speculate that overlapping pre-motor circuits may participate in walking and grooming because both behaviors have extension flexion cycles at similar frequencies, but we do not have hard experimental data to support. This would be an interesting area for future research. Here, we focused on the consequences of activating specific 13A/B neurons during grooming because they were identified through a behavioral screen for grooming disruptions, and we had developed high-resolution assays and familiarity with the normal movements in this behavior. We will clarify this rationale in the revised discussion.

      (5) The manuscript needs to be proofread and edited as there are inconsistencies in labelling in figures, phrasing errors, missing citations of figures in the text, or citations that are not in the correct order, and referencing errors (examples: 81 and 83 are identical; 94 is missing in text).

      We will carefully proofread the manuscript to fix all figure labeling, citation order, and referencing errors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programing library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs. In addition, the documentation website provided by the authors is very informative, allowing users to get quickly started with Vermouth.

      Weaknesses:

      Although the Vermouth library is designed as a general tool for topology generation for molecular simulations, only its applications with MARTINI have been demonstrated in the current study. Thus, the claimed generality of Vermouth remains to be exmained. The authors may consider to point out this in their manuscript.

      In order to demonstrate generality of the here proposed concepts for generating topologies for molecular dynamics simulations, we have now implemented and tested a workflow that will produce topologies for the popular CHARMM36 all-atom force field. To facilitate generation of all-atom topologies with Martinize2 a .rtp reader was introduced, which allows users to provide .rtp files that are the native GROMACS topology files for proteins instead of .ff files. These .rtp files exist for all major atomic protein forcefields. In addition, for CHARMM36 we also included modification files, which describe non-standard pH amino acids, histidine tautomers, and end terminal modifications. Thus, the current implementation unlocks all features available at the CG Martini level also for CHARMM36. We note that users must add the modifications files for other all-atom force fields e.g. AMBER.

      We have added a new item in the main manuscript (p28) briefly describing this proof-of-concept implementation. However, we like to point out that there are many specialized tools for the various force fields adopted by the respective communities. Thus, an exhaustive discussion on the capabilities of Martinize2 for all-atom force fields seemed out of place.

      Reviewer #2 (Public Review):

      This work introduces a Vermouth library framework to enhance software development within the Martini community. Specifically, it presents a Vermouth-powered program, Martinize2, for generating coarse-grained structures and topologies from atomistic structures. In addition to introducing the Vermouth library and the Martinize2 program, this paper illustrates how Martinize2 identifies atoms, maps them to the Martini model, generates topology files, and identifies protonation states or post-translational modifications. Compared with the prior version, the authors provide a new figure to show that Martinize2 can be applied to various molecules, such as proteins, cofactors, and lipids. To demonstrate the general application, Martinize2 was used for converting 73% of 87,084 protein structures from the template library, with failed cases primarily blamed on missing coordinates.

      I was hoping to see some fundamental changes in the resubmitted version. To my disappointment, the manuscript remains largely unchanged (even the typo I pointed out previously was not fixed). I do not doubt that Martinize2 and Vermouth are useful to the Martini community, and this paper will have some impact. The manuscript is very technical and limited to the Martini community. The scientific insight for the general coarse-grained modeling community is unclear. The goal of the work is ambitious (such as high-throughput simulations and whole-cell modeling), but the results show just a validation of Martinize2. This version does not reverse my previous impression that it is incremental. As I pointed out in my previous review (and no response from the authors), all the issues associated with the Martini model are still there, e.g. the need for ENM. In this shape, I feel this manuscript is suitable for a specialized journal in computational biophysics or stays as part of the GitHub repository.

      We apologize for not fixing the typo; it was fixed but unfortunately got reintroduced in the final resubmitted version. We politely disagree that the goal of the work itself is high-throughput simulations and whole-cell modeling, but the Martinize2 tool is certainly an important element in our ambitions to achieve this. Given the broad interest in these goals by the modeling community in general, we believe this work has a much wider impact beyond the (already large) group of Martini users. Addressing limitations of the Martini model itself, which are certainly there, is clearly not the scope of the current work.

      Reviewer #3 (Public Review):

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications. After the revisions provided by the authors, I recommend minor revision.

      The authors have addressed most of my concerns provided previously. Specifically, showcasing the capability of coarse-graining other types of molecules (Figure 7) is a useful addition, especially for the booming field of therapeutic macrocycles. My only additional concern is that to justify Martinize2 and Vermouth as a "high-throughput" method, the speed of these tools needs to be addressed in some form in the manuscript as a guideline to users.

      We have added some benchmark timings in the manuscript SI and pointed to the data in the discussion part, which addresses the timing. Martinize2 is certainly slower than martinize version 1 as we already pointed out in the previous versions. However, even for larger proteins (> 2000 residues) we are able to generate topologies in about 60s. As Martinize2 runs on a single core, it can be massively parallelized. Keeping this in mind the topology file generation is likely to take up only a fraction in a high-throughput pipeline compared to the more costly simulations themselves.

    1. Author response:

      Public Review

      Joint Public Review:

      This manuscript presents an algorithm for identifying network topologies that exhibit a desired qualitative behaviour, with a particular focus on oscillations. The approach is first demonstrated on 3-node networks, where results can be validated through exhaustive search, and then extended to 5-node networks, where the search space becomes intractable. Network topologies are represented as directed graphs, and their dynamical behaviour is classified using stochastic simulations based on the Gillespie algorithm. To efficiently explore the large design space, the authors employ reinforcement learning via Monte Carlo Tree Search (MCTS), framing circuit design as a sequential decision-making process.

      This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.

      Strengths

      The evidence presented is strong and compelling. The authors validate their results for 3-node networks through exhaustive search, and the findings for 5-node networks are consistent with previously reported motifs, lending credibility to the approach. The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.

      Weaknesses

      The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters. However, in many cases, this persistence is due to the presence of partially redundant oscillatory motifs within the network. While this observation is interesting and of clear value for circuit design, framing it as evidence of evolutionary robustness may be misleading. The "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.

      A secondary limitation is that, despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.

      Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.

      We thank the reviewers for their careful consideration of our work and for the interesting feedback and scientific discussion. We are working on a revised text based on their recommendations, which will include some of the discussion below.

      This work meaningfully extends the range of systems that can be explored in silico to uncover non-linear dynamics and represents a valuable methodological advance for the fields of systems and synthetic biology.

      We thank the reviewers for their positive assessment of our work’s impact!

      The use of reinforcement learning to navigate the vast space of possible topologies is both original and effective, and represents a novel contribution to the field. The algorithm demonstrates convincing efficiency, and the ability to identify robust oscillatory topologies is particularly valuable. Expanding the scale of systems that can be systematically explored in silico marks a significant advance for the study of complex gene regulatory networks.

      We appreciate these kind comments about our work’s merits. We are excited to share our reinforcement learning (RL) based method with the fields of systems and synthetic biology, and we consider it a valuable tool for the systematic analysis and design of larger-scale regulatory networks!

      The principal weakness of the manuscript lies in the interpretation of biological robustness. The authors identify network topologies that sustain oscillatory behaviour despite perturbations to the system or parameters… [However, these] "mutant" systems frequently exhibit altered oscillatory properties, such as changes in frequency or amplitude. From a functional cellular perspective, mere oscillation is insufficient - preservation of specific oscillation characteristics is often essential. This is particularly true in systems like circadian clocks, where misalignment with environmental cycles can have deleterious effects. Robustness, from an evolutionary standpoint, should therefore be framed as the capacity to maintain the functional phenotype, not merely the qualitative behaviour.

      We thank the reviewers for their attention to this point. In the large-scale circuit search, summarized in Figures 4A and 4B, we ran a search for 5-component oscillators that can spontaneously oscillate even when subjected to the deletion of a random gene. Some of the best performing circuits under these conditions exhibited a design feature we call “motif multiplexing,” in which multiple smaller motifs are interleaved in a way that makes oscillation possible under many different mutational scenarios. Interestingly, despite not selecting for preservation of frequency, the 3Ai+3Rep circuit (a 5-gene circuit highlighted in Figure 5) anecdotally appears to have a natural frequency that is robust to partial gene knockdowns, although not to complete gene deletions. As shown in Figure 5C, this circuit has a natural frequency of 6 cycles/hr (with one particular parameterization), and it can sustain a knockdown of any of its 5 genes to 50% of the wild-type transcription rate without altering the natural frequency by more than 20%.

      However, we agree that there are salient differences between this training scenario and natural evolution. The revised text will clarify that these differences limit what conclusions can be drawn about biological evolution by analogy. As the reviewers point out, we use the presence of spontaneous oscillations (with or without the deletion) as a measure of fitness, regardless of frequency, so as to screen for designs with promising behavior. Also, the deletion mutations introduced during training likely represent larger perturbations to the system than a typical mutation encountered during genome replication (for example, a point mutation in a response element leading to a moderate change in binding affinity). Finally, we do not introduce any entrainment. Real circadian oscillators are aligned to a 24-hour period (“entrained”) by environmental inputs such as light and temperature. For this reason, natural circadian clocks may have natural frequencies that are slightly shorter or longer than 24 hours, although a close proximity to the 24-hour period does seem to be an important selective factor [1].

      ...despite the methodological advances, the scale of the systems explored remains modest. While moving from 3- to 5-node systems is non-trivial, five elements still represent a relatively small network. It is somewhat surprising that the algorithm does not scale further, particularly when considering the performance of MCTS in other domains - for instance, modern chess engines routinely explore far larger decision trees. A discussion on current performance bottlenecks and potential avenues for improving scalability would be valuable.

      We thank the reviewers for their attention to this point. The main limitation we encountered to exploring circuits with more than 5 nodes in this work was the poor computational scaling of the Gillespie stochastic simulation algorithm, rather than a limitation of MCTS itself. While the average runtime of a 3-node circuit simulation was roughly 7 seconds, this number increased to 18-20 seconds with 5-node circuits. For this reason, we limited the search to topologies with ≤15 interaction arrows (15 sec/simulation). In general, the simulation time was proportional to the square of the number of transcription factors (TFs). We will revise the text to include the reason for stopping at 5 nodes, which is significant for understanding CircuiTree’s scaling properties.

      With regards to scaling, an important advantage of CircuiTree is its ability to generate useful candidate designs after exploring only a portion of the search space. Like exhaustive search, given enough time, MCTS will comprehensively explore the search space and find all possible solutions. However, for large search spaces, RL-based agents are generally given a finite number of simulations (or time) to learn as much as possible.

      Across machine learning (ML) applications [2] and particularly with RL models [3], this training time tends to obey a power law with respect to the underlying complexity of the problem. Thus we can use the complexity of the 3-node and 5-node searches to infer the current scaling limits of CircuiTree. The first oscillator topology was discovered after 2,280 simulations for the 3-node search, and in the 5-node search, the first oscillator using 5 nodes appeared at ~8e5 simulations, resulting in a power law of Y ~ 84.4 X<sup>0.333</sup>. Thus, useful candidate designs may be found for 6-node and 7-node searches after 4.5e7 and 5.26e9 simulations, respectively, even though these spaces contain 1.5e17 and 2.5e23 topologies, respectively. Thus, running a 7-node search with the current implementation of CircuiTree would require resources close to the current boundaries of computation, requiring roughly 1.8 million CPU-hours, or 2 weeks on 5,000 CPUs, assuming a 1-second simulation. These points will be incorporated into both the results and discussion sections in our revised text.

      However, we are optimistic about CircuiTree’s potential to scale to much larger circuits with modifications to its algorithm. CircuiTree uses the original (so-called “vanilla”) implementation of MCTS, which has not been used in professional game-playing AIs in over a decade. Contemporary RL-based game-playing engines leverage deep neural networks to dramatically reduce the training time, using value networks to identify game-winning positions and policy networks to find game-winning moves. AlphaZero, developed by Google DeepMind to learn games by self-play and without domain knowledge, outperformed all other chess AIs after 44 million training games, much smaller than the 10^43 possible chess states [4]. Similarly, the game of go has 10<sup>170</sup> possible states, but AlphaZero outperformed other AIs after only 140 million games [4]. Large circuits live in similarly large search spaces; for example, 19-node and 20-node circuits represent spaces of 10<sup>172</sup> and 10<sup>190</sup> possible topologies. The revised text will include this discussion and identify value and policy networks, as well as more scalable simulation paradigms such as ODEs and neural ODEs, as our future directions for improving CircuiTree’s scalability.

      Finally, our revised discussion will note some important differences between game-playing and biological circuit design. Unlike deterministic games like chess, the final value of a circuit topology is determined stochastically, by running a simulation whose fitness depends on the parameter set and initial conditions. Thus, state-for-state, it is possible that training an agent for circuit design may inherently require more simulations to achieve the same level of certainty compared to classical games. Additionally, while we often possess a priori knowledge about a game such as its overall difficulty or certain known strategies, we lack this frame of reference when searching for circuit designs. Thus, it remains challenging to know if and when a large space of designs has been “satisfactorily” or “comprehensively” searched, since the answer depends on data that are unknown, namely the quantity, quality, and location of solutions residing in the search space.

      Not accounting for redundancy due to structural symmetries

      Finally, it is worth noting that the emergence of oscillations in a model often depends not only on the topology but also critically on parameter choices and the nature of the nonlinearities. The use of Hill functions and high Hill coefficients is a common strategy to induce oscillatory dynamics. Thus, the reported results should be interpreted within the context of the modelling assumptions and parameter regimes employed in the simulations.

      In our dynamical modeling of transcription factor (TF) networks, we do not rely on continuum assumptions about promoter occupancy such as Hill functions. Rather, we model each reaction - transcription, translation, TF binding/unbinding, and degradation - explicitly, and individual molecules appear and disappear via stochastic birth and death events. Many natural TFs are homodimers that bind cooperatively to regulate transcription; similarly, we assume that pairs of TFs bind more stably to their response element than individual TFs. Thus, our model has similar cooperativity to a Hill function, and it can be shown that in the continuum limit, the effective Hill coefficient is always ≤2. Our revision will clarify this aspect of the modeling and include a derivation of this property. Currently, the parameter values used in the figures are shown in Table 2. In the revised text, these will be displayed in the body of the text as well for clarity.

      Bibliography (1) Spoelstra, K., Wikelski, M., Daan, S., Loudon, A. S. I., & Hau, M. (2015). Natural selection against a circadian clock gene mutation in mice. PNAS, 113(3), 686–691. https://doi.org/https://doi.org/10.1073/pnas.1516442113<br /> (2) Neumann, O., & Gros, C. (2023). Scaling Laws for a Multi-Agent Reinforcement Learning Model. The Eleventh International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=ZrEbzL9eQ3W (3) Jones, A. L. (2021). Scaling Scaling Laws with Board Games. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2104.03113 (4) Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that Masters Chess, Shogi, and go through self-play. Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404

    1. Author response:

      Reviewer #1 (Public Review):

      Summary: 

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation. 

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, we have provided the first comprehensive analysis of these datasets.

      Strengths: 

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans. 

      Weaknesses: 

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect. 

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We will revise the text to highlight what this study and others demonstrate about SMA-9’s role in body size. We also plan to analyze additional target genes to deepen our model for how SMA-3 and SMA-9 interact functionally to produce a given transcriptional response.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. The limitation in the previous work is that only a small number of target genes was analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale.  Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We will revise the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9. 

      We appreciate this suggestion and will clarify how SMA-9 and its target genes contribute to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion. 

      We thank the reviewer for this suggestion. We will add more context to the Discussion.

      Reviewer #2 (Public Review): 

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers. 

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data. 

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses? 

      We agree that these are intriguing questions and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which. 

      We thank the reviewer for this suggestion and will add the requested information in the text.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated. 

      rol-6 has been identified as a transcriptional target of this pathway. The level of ROL-6 protein, however, is not changed in sma-3 and sma-9 mutants, indicating that there is post-transcriptional compensation. We will include these data in the revised manuscript.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.). 

      We will add this information to the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Work by Brosseau et. al. combines NMR, biochemical assays, and MD simulations to characterize the influence of the C-terminal tail of EmrE, a model multi-drug efflux pump, on proton leak. The authors compare the WT pump to a C-terminal tail deletion, delta_107, finding that the mutant has increased proton leak in proteoliposome assays, shifted pH dependence with a new titratable residue, faster-alternating access at high pH values, and reduced growth, consistent with proton leak of the PMF.

      Strengths:

      The work combines thorough experimental analysis of structural, dynamic, and electrochemical properties of the mutant relative to WT proteins. The computational work is well aligned in vision and analysis. Although all questions are not answered, the authors lay out a logical exploration of the possible explanations.

      Weaknesses:

      There are a few analyses that are missing and important data left out. For example, the relative rate of drug efflux of the mutant should be reported to justify the focus on proton leak. Additionally, the correlation between structural interactions should be directly analyzed and the mutant PMF also analyzed to justify the claims based on hydration alone. Some aspects of the increased dynamics at high pH due to a potential salt bridge are not clear.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores the role of the C-terminal tail of EmrE in controlling uncoupled proton flux. Leakage occurs in the wild-type transporter under certain conditions but is amplified in the C-terminal truncation mutant D107. The authors use an impressive combination of growth assays, transport assays, NMR on WT and mutants with and without key substrates, classical MD, and reactive MD to address this problem. Overall, I think that the claims are well supported by the data, but I am most concerned about the reproducibility of the MD data, initial structures used for simulations, and the stochasticity of the water wire formation. These can all be addressed in a revision with more simulations as I point out below. I want to point out that the discussion was very nicely written, and I enjoyed reading the summary of the data and the connection to other studies very much.

      Strengths:

      The Henzler-Wildman lab is at the forefront of using quantitative experiments to probe the peculiarities in transporter biophysics, and the MD work from the Voth lab complements the experiments quite well. The sheer number of different types of experimental and computational approaches performed here is impressive.

      Weaknesses:

      The primary weaknesses are related to the reproducibility of the MD results with regard to the formation of water wires in the WT and truncation mutant. This could be resolved with simulations starting from structures built using very different loops and C-terminal tails.

      The water wire gates identified in the MD should be tested experimentally with site-directed mutagenesis to determine if those residues do impact leak.

      We appreciate the reviewers thoughtful consideration of our manuscript, and their recognition of the variety of experimental and computational approaches we have brought to bear in probing the very challenging question of uncoupled proton leak through EmrE.

      We did record SSME measurements with MeTPP+, a small molecule substrate at two different protein:lipid ratios. These experiments report the rate of net flux when both proton-coupled substrate antiport and substrate-gated proton leak are possible. We will add this data to the revision, including data acquired with different lipid:protein ratio that confirms we are detecting transport rather than binding. In brief, this data shows that the net flux is highly dependent on both proton concentration (pH) and drug-substrate concentration, as predicted by our mechanistic model. This demonstrates that both types of transport contribute to net flux when small molecule substrates are present.

      In the absence of drug-substrate, proton leak is the only possible transport pathway. The pyranine assay directly assesses proton leak under these conditions and unambiguously shows faster proton entry into proteoliposomes through the ∆107-EmrE mutant than through WT EmrE, with the rate of proton entry into ∆107-EmrE proteoliposomes matching the rate of proton entry achieved by the protonophore CCCP. We have revised the text to more clearly emphasize how this directly measures proton leak independently of any other type of transport activity. The SSME experiments with a proton gradient only (no small molecule substrate present) provide additional data on shorter timescales that is consistent with the pyranine data. The consistency of the data across multiple LPRs and comparison of transport to proton leak in the SSME assays further strengthens the importance of the C-terminal tail in determining the rate of flux.

      None of the current structural models have good resolution (crystallography, EM) or sufficient restraints (NMR) to define the loop and tail conformations sufficiently for comparison with this work. We are in the process of refining an experimental structure of EmrE with better resolution of the loop and tail regions implicated in proton-entry and leak. Direct assessment of structural interactions via mutagenesis is complicated because of the antiparallel homodimer structure of EmrE. Any point mutation necessarily affects both subunits of the dimer, and mutations designed to probe the hydrophobic gate on the more open face of the transporter also have the potential to disrupt closure on the opposite face, particularly in the absence of sufficient resolution in the available structures. Thus, mutagenesis to test specific predicted structural features is deferred until our structure is complete so that we can appropriately interpret the results.

      In our simulation setup, the MD results can be considered representative and meaningful for two reasons. First, the C-terminal tail, not present in the prior structure and thus modeled by us, is only 4 residues long. We will show in the revision and detailed response that the system will lose memory of its previous conformation very quickly, such that velocity initialization alone is enough for a diverse starting point. Second, our simulation is more like simulated annealing, starting from a high free energy state to show that, given such random initialization, the tail conformation we get in the end is consistent with what we reported. It is also difficult to sample back-and-forth tail motion within a realistic MD timescale. Therefore, it can be unconclusive to causally infer the allosteric motions with unbiased MD of the wildtype alone. The best viable way is to look at the equilibrium statistics of the most stable states between WT- and ∆107-EmrE and compare the differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is well done and well presented. In my opinion, the authors must address the following questions.

      (1) It is unclear to a non-SSME-expert, why the net charge translocated in delta_107 is larger than in WT. For such small pH gradients (0.5-1pH unit), it seems that only a few protons would leave the liposome before the internal pH is adjusted to be the same as the external. This number can be estimated given the size of the liposomes. What is it? Once the pH gradient is dissipated, no more net proton transport should be observed. So, why would more protons flow out of the mutant relative to WT?

      We appreciate the complexity of both the system and assay and have made revisions to both the main text and SI to address these points more clearly. While we can estimate liposomes size, we cannot easily quantify the number of liposomes on the sensor surface so cannot calculate the amount of charge movement as suggested by the reviewer. We have revised Fig. 3.2 and added additional data at low and high pH with different lipid to protein ratios to distinguish pre-steady state (proton release from the protein) and steady state processes (transport). An extended Fig. 3.2 caption and revised discussion in the main text clarify these points.

      We have also revised SI figure 3.2 to include an example of transport driven by an infinite drug gradient. Drug-proton antiport results in net charge build-up in the liposome since two protons will be driven out for every +1 drug transported in. This also creates a pH gradient is created (higher proton concentration outside). The negative inside potential inhibits further antiport of drug. However, both the negative-inside potential and proton gradient will drives protons back into the liposome if there is a leak pathway available. This is clearly visible with a reversal of current negative (antiport) to positive (proton backflow), and the magnitude of this back flow is larger for ∆107-EmrE which lacks the regulatory elements provided by the C-terminal tail. We have amended the main text and SI to include this discussion.

      (2) Given the estimated rate of transport, size of liposomes, and pH gradient, how quickly would the SSME liposomes reach pH balance?

      Since SSME measurements are due to capacitive coupling and will represent the net charge movement, including pre-steady state contributions, the current values will be incredibly sensitive to individual rates of alternating access, proton and drug on- and off-rates. Time to pH balance would, therefore, differ based on the construct, LPR, absolute pH or drug concentrations as well as the magnitude of the given gradients. For this reason, we necessarily use integrated currents (transported charge over time) when comparing mutants as it reflects kinetic differences inherent to the mutant without over-processing the data, for example, by normalizing to peak currents which would over emphasize certain properties that will differ across mutants. This process allows for qualitative comparisons by subjecting mutants to the same pH and substrate gradients when the same density of transporter construct is present, and care is given to not overstate the importance of the actual quantities of charges that are moving as they will be highly context dependent. This is clearly seen in Fig 3.2 where the current is not zero and the net transported charge is still changing at the end of 1 second. We have amended SI figure 3.2 and the main text to include this discussion.

      (3) Given that H110 and E14 would deprotonate when the external pH is elevated above 7 and that these protons would be released to external bulk, the external bulk pH would decrease twice as much for WT compared to delta107. This would decrease the pH gradient for WT relative to the mutant. Can these effects be quantified and accounted for? Would this ostensibly decrease the amount of charge that transfers into the liposomes for WT? How would this impact the current interpretation that the two systems are driven by the same gradient?

      The reviewer is correct that there will be differences in deprotonation of WT and ∆107 and the amount of proton release will also change with pH. We have amended Figure 3.2 to clarify this difference and its significance. For the proton gradient only conditions in Figure 3, each set of liposomes were equilibrated to the starting pH by repeated washings and incubation before measurement occurred. For example, for the pH 6.5 inside, pH 7 outside condition, both the inside and outside pH were equilibrated at 6.5, and both E14 residues will be predominantly protonated in WT and ∆107, and H110 will be predominantly protonated in WT-EmrE. Upon application of the external pH 7 solution, protons will be released from the E14 of either construct, with additional proton being released from H110 for WT-EmrE causing a large pre-steady state negative contribution to the signal (Fig. 3.2A). Under this pH condition, we the peak current correlates with the LPR, as this release of protons will depend on density of the transporter. However, we also see that the longer-time decay of the signal correlates with the construct (WT or ∆107) and is relatively independent of LPR, consistent with a transport process rather than a rapid pre-steady state release of protons. Therefore, when we look at the actual transported charge over time, despite the higher contribution of proton release to the WT-EmrE signal, the significant increase in uncoupled proton transport for the C-terminal deletion mutant dominates the signal.

      As a contrast, we apply this same analysis to the pH 8 inside, pH 8.5 outside condition where both sets of transports will be deprotonated from the start (Fig. 3.2B). Now the peak currents, decay rates, and transported charge over time are all consistent for a given construct (WT or ∆107). The two LPRs for an individual construct match within error, as the differences in overall charge movement and transported charge over time are independent of pre-steady-state proton release from the transporter at high pH.

      (4) A related question, how does the protonation of H110 influence the potential rate of proton transport between the two systems? Does the proton on H110 transfer to E14?

      The protonation of H110 will only influence the rate of transport of WT-EmrE as its protonation is required for formation of the hydrogen bonding network that coordinates gating. However, protonation of both E14s will influence the rate of proton transport of both systems as protonation state affects the rate of alternating access which is necessary for proton turnover. This is another reason we use the transported charge over time metric to compare mutants as it allows for a common metric for mutants with altered rates which are present in the same density and under the same gradient conditions. We do not have any evidence to support transfer of proton from H110 to E14, but there is also no evidence to exclude this possibility. We do not discuss this in the manuscript because it would be entirely speculative.

      (5) Is the pKa in the simulations (Figure 6B) consistent with the experiment?

      We calculated the pKa from this WT PMF and got a pKa of 7.1, which is in close proximity of the experimental value of 6.8

      (6) Why isn't the PMF for delta_107 compared to WT to corroborate the prediction that hydration sufficiently alters both the rate and pKa of E14?

      We appreciate the reviewer’s suggestion and agree that a direct comparison would be valuable. However, several factors limit the interpretability of such an analysis in this context:

      (a) Our data indicate that the primary difference in free energy barriers between WT and Δ107 lies in the hydration step rather than proton transport itself. To fully resolve this, a 2D PMF calculation via 2D umbrella sampling would be required which can be very expensive. Solely looking at the proton transport side of this PMF will not give much difference.

      (b) Given this, the aim for us to calculate this PMF is to support our conjecture that the bottleneck for such transport is the hydrophobic gate.

      (7) The authors suggest that A61 rotation 'controls the water wire formation' by measuring the distribution of water connectivity (water-water distances via logS) and average distances between A61 and I68/I67. Delta_107 has a larger inter-residue distance (Figure 6A) more probable small log S closer waters connecting E14 and two residues near the top of the protein (Figure 5A). However, it strikes me that looking at average distances and the distribution of log S is not the best way to do this. Why not quantify the correlation between log S and A61 orientation and/or A61-I68/I71 distances as well as their correlation to the proposed tail interactions (D84-R106 interactions) to directly verify the correlation (and suggest causation) of these interactions on the hydration in this region. Additionally, plotting the RMSD or probability of waters below I68 and I171 as a function of A61-I68 distances and/or numbers over time would support the log S analysis.

      The reviewer requested that we provide direct correlation analyses between A61 orientation, residue distances (A61-I68/I71), and water connectivity (logS) to better support the claim about water wire formation, rather than relying solely on average distances and distributions.

      We appreciate the reviewer’s suggestion to strengthen our analysis with direct correlations. However, due to the slow kinetics of hydration/dehydration events, unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses. Instead, our approach focuses on equilibrium statistics, which reveal the dominant conformational states of WT- and Δ107-EmrE and provide meaningful insights into shifts in hydration patterns.

      (8) It looks like the D84-R106 salt bridge controls this A61-I68 opening. Could this also be quantifiably correlated?

      As discussed in response to the previous question, the unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses.

      (9) The NMR results show that alternating access increases in frequency from ~4/s for WT at low and high pH to ~17/s for delta_107 only at high pH. They then go on to analyze potential titration changes in the delta_107 mutant, finding two residues with approximate pKa values of 5.6 and 7.1. The former is assigned to E14, consistent with WT. But the latter is suggested to be either D84, which salt bridges to R106, or the C-terminal carboxylate. If it is D84, why would deprotonation, which would be essential to form the salt bridge, increase the rate of alternating access relative to WT?

      We note that the faster alternating access rate was observed for TPP+-bound ∆107-EmrE, not the transporter in the absence of substrate. In the absence of substrate the relatively broad lines preclude quantitative determination of the alternating access rate by NMR making it difficult to judge the validity of the reviewers reasoning. Identification of which residue (D84 or H110) corresponds to the shifted pKa is ultimately of little consequence as this mutant does not reflect the native conditions of the transporter. It is far more important to acknowledge that both R106 and D84 are sensitive to this deprotonation as it indicates these residues are close in space and provides experimental support for the existence of the salt bridge identified in the MD simulations, as discussed in the manuscript.

      (10) In a more general sense, can the authors speculate why an efflux pump would evolve this type of secondary gate that can be thrown off by tight binding in the allosteric site such as that demonstrated by Harmane? What potential advantage is there to having a tail-regulated gate?

      This was likely a necessity to allow for better coupling as these transporters evolved to be more promiscuous. The C-terminal tail is absent in tightly coupled family members such as Gdx who are specific for a single substrate and have a better-defined transport stoichiometry. We have included this discussion in the main text and are currently investigating this phenomenon further. Those experiments are beyond the scope of the current manuscript.

      (11) It is hard to visualize the PT reaction coordinate. Is the e_PT unit vector defined for each window separately based on the initial steered MD pathway? If so, how reliant is the PT pathway on this initial approximate path? Also, how does this position for each window change if/when E14 rotates? This could be checked by plotting the x,y,z distributions for each window and quantifying the overlap between windows in cartesian space. These clouds of distributions could also be plotted in the protein following alignment so the reader can visualize the reaction coordinate. Does the CEC localization ever stray to different, disconnected regions of cartesian phase space that are hidden by the reaction coordinate definition?

      The unit vector e_PT is the same across all windows based on unbiased MD. Therefore, the reaction coordinate (a scalar) is the vector from the starting point to the CEC, projected on this unit vector. E14 rotation does not significantly change the window definition a lot unless the CEC is very close to E14, where we found this to be a better CV. For detailed discussions about this CV, especially a comparison between a curvilinear CV, please see J. Am. Chem. Soc. 2018, 140, 48, 16535–16543 “Simulations of the Proton Transport” and its SI Figure S1.In the Supplementary Information, we added figure 6.1 to show the average X, Y, Z coordinates of each umbrella window.

      (12) Lastly, perhaps I missed it, but it's unclear if the rate of substrate efflux is also increased in the delta_107 mutant. If this is also increased, then the overall rate of exchange is faster, including proton leak. This would be important to distinguish since the focus now is entirely on proton leaks. I.e., is it only leak or is it overall efflux and leak?

      We have amended SI figure 3.2 to include a gradient condition where an infinite drug gradient is created across the liposome. The infinite gradient allows for rapid transport of drug into the liposomes until charge build-up opposes further transport. This peak is at the same time for both LPRs of WT- and ∆107-EmrE suggesting the rate of substrate transport is similar. Differences in the peak heights across LPRs can be attributed to competition between drug and proton for the primary binding site such that more proton will be released for the higher density constructs as described above. This process does also create a proton gradient as drug moving in is coupled to two protons moving out so as charge build-up inhibits further drug movement, the building proton gradient will also begin to drive proton back in which is another example of uncoupled leak. Here, again we see that this back-flow of protons or leak is of greater magnitude for ∆107-EmrE proteoliposomes that for those with WT-EmrE. We have included this discussion in the SI and main text.

      Minor

      (1) Introduction - the authors describe EmrE as a model system for studying the molecular mechanism of proton-coupled transport. This is a rather broad categorization that could include a wide range of phenomena distal from drug transport across membranes or through efflux pumps. I suggest further specifying to not overgeneralize.

      We revised to note the context of multidrug efflux.

      Reviewer #2 (Recommendations for the authors):

      Simulations. The initial water wire analysis is based on 4 different 1 ms simulations presented in Figure 5. The 3 WT replicates show similar results for the tail-blocking water wire formation, but the details of the system build and loop/C-terminal tail placement are not clear. It does appear that a single C-terminal tail model was created for all WT replicates. Was there also modeling for any parts of the truncation mutant? Regardless, since these initial placements and uncertainties in the structures may impact the results and subsequent water wire formation, I would like a discussion of how these starting structures impacted the formation or not of wires. I think that another WT replicate should be run starting from a completely new build that places the tail in a different (but hopefully reasonable location). This could be built with any number of tools to generate reasonable starting structures. It's critical to ensure that multiple independent simulations across different initial builds show the same water wire behavior so that we know the results are robust and insensitive to the starting structure and stochastic variation.

      We thank Reviewer 2 for their suggestion regarding the discussion of the initial structure. In our simulations, the C-terminal tail was initially modeled in an extended conformation (solvent-exposed) to mimic its disordered state prior to folding. This approach resembles an annealing process, where the system evolves from a higher free-energy state toward equilibrium. Notably, across all three replicas, we observed consistent folding of the tail onto the protein surface, supporting the robustness of this conformational preference.

      For the Δ107 truncation mutant, minimal modeling was required, as most experimental structures resolve residues up to S105 or R106. To rigorously assess the influence of the starting configuration, we analyzed the tail’s dynamics using backbone dihedral angle auto- and cross-correlation functions (new Supplementary Figures 10.1 and 10.2). These analyses reveal rapid decay of correlations—consistent with the tail’s short length (5 residues) and high flexibility—indicating that the system "forgets" its initial configuration well within the simulation timescale. Thus, we conclude that our sampling is sufficient to capture equilibrium behavior, independent of the starting structure.

      What does the size of the barrier in the PMF (Figure 6B) imply about the rate of proton transfer/leak and can the pKa shift of the acidic residue be estimated with this energy value compared to bulk?

      We noticed this point aligns with a related concern raised by Reviewer 1. For a detailed discussion please refer to Point 5 in our response to Reviewer 1.

      Experimental validation. The hypotheses generated by this work would be better buttressed if there were some mutation work at the hydrophobic gate (61, 68, 71) to support it. I realize that this may be hard, but it would significantly improve the quality.

      Due to the small size of the transporter, any mutagenesis of EmrE should necessarily be accompanied by functional characterization to fully assess the effects of the mutation on rate-limiting steps. We have revised the manuscript to add a discussion of the challenges with analyzing simple point mutants and citing what is known from prior scanning mutagenesis studies of EmrE.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      The addition of the discussion about the two isomers of 18:1 didn't quite work in the place that the authors added. What the authors wrote on line 126 is true about 18:1 isomers in wild type worms. However, they are reporting their lipidomics results of the fat-2(wa17) mutant worms. In this case, a substantial amount of the 18:1 is the oleic acid (18:1n-9) isomer. The authors can check Table 2 in their reference [10] and see that wild type and other fat mutants indeed contain approximately 10 fold more cis vaccenic than oleic acid, the fat-2(wa17) mutants do accumulate oleic acid, because the wild type activity of FAT-2 is to convert oleic acid to linoleic acid, where it can be converted to downstream PUFAs. I suggest editing their sentence on line 126 to say that the high 18:1 they observed agrees with [10], and then comment about reference 10 showing the majority of 18:1 being the cis-vaccenic isomer in most strains, but the oleic acid isomer is more abundantly in the fat-2(wa17) mutant strain.

      We thank the reviewer for spotting that and sparing us a bit of embarrassment. We have now modified the text and hope we got it right this time:

      "Even though the lipid analysis methods used here are not able to distinguish between different 18:1 species, a previous study showed that the majority of the 18:1 fatty acids in the fat-2(wa17) mutant is actually 18:1n9 (OA) [10] and not 18:1n7 (vaccenic acid) as in most other strains [10,23]; this is because OA is the substrate of FAT-2 and thus accumulates in the mutant."

      Reviewer #2:

      I still do not agree with the answer to my previous comment 6 regarding Figure S2E. The authors claim that hif-1(et69) suppresses fat-2(wa17) in a ftn-2 null background (in Figure S2 legend for example). To claim so, they would need to compare the triple mutant with fat2(wa17);ftn-2(ok404) and show some rescue. However, we see in Figure 5H that ftn2(ok404) alone rescues fat-2(wa17). Thus, by comparing both figures, I see no additional effect of hif-1(et69) in an ftn-2(ok404) background. I actually think that this makes more sense, since the authors claim that hif-1(et69) is a gain-of-function mutation that acts through suppression of ftn-2 expression. Thus, I would expect that without ftn-2 from the beginning, hif-1(et69) does not have an additional effect, and this seems to be what we see from the data. Thus, I would suggest that the authors reformulate their claims regarding the effect of hif1(et69) in the ftn-2(ok404) background, which seems to be absent (consistently with what one would expect).

      We completely agree with the reviewer and indeed this is the meaning that we tried to convey all along. The text has now been modified as follows:

      "Lastly, ftn-2(et68) is still a potent fat-2(wa17) suppressor when hif-1 is knocked out (S2D Fig), suggesting that no other HIF-1-dependent functions are required as long as ftn-2 is downregulated; this conclusion is supported by the observation that the potency of the ftn2(ok404) null allele to act as a fat-2(wa17) suppressor is not increased by including the hif-1(et69) allele (compare Fig 5H and S2E Fig)."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors present a novel CRISPR/Cas9-based genetic tool for the dopamine receptor dop1R2. Based on the known function of the receptor in learning and memory, they tested the efficacy of the genetic tool by knocking out the receptor specifically in mushroom body neurons. The data suggest that dop1R2 is necessary for longer-lasting memories through its action on ⍺/ß and ⍺'/ß' neurons but is dispensable for short-term memory and thus in ɣ neurons. The experiments impressively demonstrate the value of such a genetic tool and illustrate the specific function of the receptor in subpopulations of KCs for longer-term memories. The data presented in this manuscript are significant.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the role of the dopamine receptor, Dop1R2, in memory formation. This receptor has complex roles in supporting different stages of memory, and the neural mechanisms for these functions are poorly understood. The authors are able to localize Dop1R2 function to the vertical lobes of the mushroom body, revealing a role in later (presumably middle-term) aversive and appetitive memory. In general, the experimental design is rigorous, and statistics are appropriately applied. While the manuscript provides a useful tool, it would be strengthened further by additional mechanistic studies that build on the rich literature examining the roles of dopamine signaling in memory formation. The claim that Dop1R2 is involved in memory formation is strongly supported by the data presented, and this manuscript adds to a growing literature revealing that dopamine is a critical regulator of olfactory memory. However, the manuscript does not necessarily extend much beyond our understanding of Dop1R2 in memory formation, and future work will be needed to fully characterize this reagent and define the role of Dop1R2 in memory.

      Strengths:

      (1) The FRT lines generated provide a novel tool for temporal and spatially precise manipulation of Dop1R2 function. This tool will be valuable to study the role of Dop1R2 in memory and other behaviors potentially regulated by this gene.

      (2) Given the highly conserved role of Dop1R2 in memory and other processes, these findings have a high potential to translate to vertebrate species.

      Weaknesses:

      (1) The authors state Dop1R2 associates with two different G-proteins. It would be useful to know which one is mediating the loss of aversive and appetitive memory in Dop1R2 knockout flies.

      We thank you for the insightful comment. We agree that it would be very useful to know which G-proteins are transmitting Dop1R2 signaling. To that extent, we examined single-cell transcriptomics data to check the level of co-expression of Dop1R2 with G-proteins that are of interest to us. (Figure 1 S1)

      Lines 312-325

      “Some RNA binding proteins and Immediate early genes help maintain identities of Mushroom body cells and are regulators of local transcription and translation (de Queiroz et al., 2025; Raun et al., 2025). So, the availability of different G-proteins may change in different lobes and during different phases of memory. The G-protein via which GPCRs signal, may depend on the pool of available G-proteins in the cell/sub-cellular region (Hermans, 2003)., Therefore, Dop1R2 may signal via different G-proteins in different compartments of the Mushroom body and also different compartments of the neuron. We looked at Gαo and Gαq as they are known to have roles in learning and forgetting (Ferris et al., 2006; Himmelreich et al., 2017). We found that Dop1R2 co-expresses more frequently with Gαo than with Gαq (Figure 1 S1). While there is evidence for Dop1R2 to act via Gαq (Himmelreich et al., 2017). It is difficult to determine whether this interaction is exclusive, or if Dop1R2 can also be coupled to other G-proteins. It will be interesting to determine the breadth of G-proteins that are involved in Dop1R2 signaling.”

      (2) It would be interesting to examine 24hr aversive memory, in addition to 24hr appetitive memory.

      This is indeed an important point and we agree that it will complete the assessment of temporally distinct memory traces. We therefore performed the Aversive LTM experiments and include them in the results.

      Lines 208-228

      “24h memory is impaired by loss of Dop1R2

      Next, we wanted to see if later memory forms are also affected. One cycle of reward training is sufficient to create LTM (Krashes & Waddell, 2008), while for aversive memory, 5-6 cycles of electroshock-trainings are required to obtain robust long-term memory scores (Tully et al., 1994). So, we looked at both, 24h aversive and appetitive memory. For aversive LTM, the flies were tested on the Y-Maze apparatus as described in (Mohandasan et al., (2022).

      Flipping out Dop1R2 in the whole MB causes a reduced 24h memory performance (Figure 4A, E). No phenotype was observed when Ddop1R2 was flipped out in the γ-lobe (Figure 4B, F). However, similar to 2h memory, loss of Ddop1R2 in the α/β-lobes (Figure 4C, G) or the α’/β’-lobes (Figure 4D, H) causes a reduction in memory performance. Thus, Dop1R2 seems to be involved in aversive and appetitive LTM in the α/β-lobes and the α’/β’-lobes.

      Previous studies have shown mutation in the Dop1R2 receptor leads to improvement in LTM when a single shock training paradigm is used (Berry et al., 2012). As we found that it disrupts LTM, we wanted to verify if the absence of Dop1R2 outside the MB is what leads to an improvement in memory. To that extent, we tested panneuronal flip-out of Dop1R2 flies for 6hr and 24hr memory upon single shock using the elav-Gal4 driver. We found that it did not improve memory at both time points (Figure 4 S1). Confirming that flipping out Dop1R2 panneuronally does not improve LTM (Figure 4 S1C) and highlighting its irrelevance in memory outside the MB.”

      (3) The manuscript would be strengthened by added functional analysis. What are the DANs that signal through Dop1R. How do these knockouts impact MBONs?

      We thank you for this question. We indeed agree that it is a highly relevand and open question, how distinct DANs signal via distinct Dopamine receptors. Our work here uniquely focusses on Dop1R2 within the MB. We aim to investigate other DopRs and the connection between DANs in the future using similar approaches.

      (4) Also in Figure 2, the lobe-specific knockouts might be moved to supplemental since there is no effect. Instead, consider moving the control sensory tests into the main figure.

      We thank you for this suggestion and understand that in Figure 2 no significant difference is seen. However, we have emphasized in the text that the results from the supplementary figures are just to confirm that the modifications made at the Dop1R2 locus did not alter its normal function.

      Lines 156-162

      “We wanted to see if flipping out Dop1R2 in the MB affects memory acquisition and STM by using classical olfactory conditioning. In short, a group of flies is presented with an odor coupled to an electric shock (aversive) or sugar (appetitive) followed by a second odor without stimulus. For assessing their memory, flies can freely choose between the odors either directly after training (STM) or at a later timepoint.

      To ensure that the introduced genetic changes to the Dop1R2 locus do not interfere with behavior we first checked the sensory responses of that line”

      (5) Can the single-cell atlas data be used to narrow down the cell types in the vertical lobes that express Dop1R2? Is it all or just a subset?

      This is indeed an interesting question, and we thank you for mentioning it. To address this as best as we could, we analyzed the single cell transcriptomic data from (Davie et al., 2018) and presented it in Figure 1 S1.

      Reviewer #3 (Public Review):

      Summary:

      Kaldun et al. investigated the role of Dopamine Receptor Dop1R2 in different types and stages of olfactory associative memory in Drosophila melanogaster. Dop1R2 is a type 1 Dopamine receptor that can act both through Gs-cAMP and Gq-ERCa2+ pathways. The authors first developed a very useful tool, where tissue-specific knock-out mutants can be generated, using Crispr/Cas9 technology in combination with the powerful Gal4/UAS gene-expression toolkit, very common in fruit flies.

      They direct the K.O. mutation to intrinsic neurons of the main associative memory centre fly brain-the mushroom body (MB). There are three main types of MB-neurons, or Kenyon cells, according to their axonal projections: a/b; a'/b', and g neurons.

      Kaldun et al. found that flies lacking dop1R2 all over the MB displayed impaired appetitive middle-term (2h) and long-term (24h) memory, whereas appetitive short-term memory remained intact. Knocking-out dop1R2 in the three MB neuron subtypes also impaired middle-term, but not short-term, aversive memory.

      These memory defects were recapitulated when the loss of the dop1R2 gene was restricted to either a/b or a'/b', but not when the loss of the gene was restricted to g neurons, showcasing a compartmentalized role of Dop1R2 in specific neuronal subtypes of the main memory centre of the fly brain for the expression of middle and long-term memories.

      Strengths:

      (1) The conclusions of this paper are very well supported by the data, and the authors systematically addressed the requirement of a very interesting type of dopamine receptor in both appetitive and aversive memories. These findings are important for the fields of learning and memory and dopaminergic neuromodulation among others. The evidence in the literature so far was generated in different labs, each using different tools (mutants, RNAi knockdowns driven in different developmental stages...), different time points (short, middle, and long-term memory), different types of memories (Anesthesia resistant, which is a type of protein synthesis independent consolidated memory; anesthesia sensitive, which is a type of protein synthesis-dependent consolidated memory; aversive memory; appetitive memory...) and different behavioral paradigms. A study like this one allows for direct comparison of the results, and generalized observations.

      (2) Additionally, Kaldun and collaborators addressed the requirement of different types of Kenyon cells, that have been classically involved in different memory stages: g KCs for memory acquisition and a/b or a'/b' for later memory phases. This systematical approach has not been performed before.

      (3) Importantly, the authors of this paper produced a tool to generate tissue-specific knock-out mutants of dop1R2. Although this is not the first time that the requirement of this gene in different memory phases has been studied, the tools used here represent the most sophisticated genetic approach to induce a loss of function phenotypes exclusively in MB neurons.

      Weaknesses:

      (1) Although the paper does have important strengths, the main weakness of this work is that the advancement in the field could be considered incremental: the main findings of the manuscript had been reported before by several groups, using tissue-specific conditional knockdowns through interference RNAi. The requirement of Dop1R2 in MB for middle-term and long-term memories has been shown both for appetitive (Musso et al 2015, Sun et al 2020) and aversive associations (Plaçais et al 2017).

      Thank you for this comment. We believe that the main takeaway from the paper is the elegant tool we developed, to study the role of Dop1R2 in fruit flies by effectively flipping it out spatio-temporally. Additionally, we studied its role in all types of olfactory associative memory to establish it as a robust tool that can be used for further research in place of RNAi knockouts which are shown to be less efficient in insects as mentioned in the texts in line 394-398.

      “The genetic tool we generated here to study the role of the Dop1R2 dopamine receptor in cells of interest, is not only a good substitute for RNAi knockouts, which are known to be less efficient in insects (Joga et al., 2016), but also provides versatile possibilities as it can be used in combination with the powerful genetic tools of Drosophila.”

      (2) The approach used here to genetically modify memory neurons is not temporally restricted. Considering the role of dopamine in the correct development of the nervous system, one must consider the possible effects that this manipulation can have in the establishment of memory circuits. However, previous studies addressing this question restricted the manipulation of Dop1R2 expression to adulthood, leading to the same findings than the ones reported in this paper for both aversive and appetitive memories, which solidifies the findings of this paper.

      We thank you for this comment and we agree that it would be important to show a temporally restricted effect of Dop1R2 knockout. To assess this and rule out potential developmental defects we decided to restrict the knockout to the post-eclosion stage and to include these results.

      Lines 230-250

      “Developmental defects are ruled out in a temporally restricted Dop1R2 conditional knockout.

      To exclude developmental defects in the MB caused by flip-out of Dop1R2, we stained fly brains with a FasII antibody. Compared to genetic controls, flies lacking Dop1R2 in the mushroom body had unaltered lobes (Figure 4 S2C).

      Regardless, we wanted to control for developmental defects leading to memory loss in flip-out flies. So, we generated a Gal80ts-containing line, enabling the temporal control of Dop1R2 knockout in the entire mushroom body (MB). Given that the half-life of the receptor remains unknown, we assessed both aversive short-term memory (STM) and long-term memory (LTM) to determine whether post-eclosion ablation of Dop1R2 in the MB produced differences compared to our previously tested line, in which Dop1R2 was constitutively knocked out from fertilization. To achieve this, flies were maintained at 18°C until eclosion and subsequently shifted to 30°C for five to seven days. On the fifth day, training was conducted, followed by memory testing. Our results indicate that aversive STM was not significantly impaired in Dop1R2-deficient MBs compared to control flies (Figure 4 S3), consistent with our previous findings (Figure 2). However, aversive LTM was significantly impaired relative to control lines (Figure 4 S3), which also aligned with prior observations. These findings strongly indicate that memory loss caused by Dop1R2 flip-out is not due to developmental defects.”

      (3) The authors state that they aim to resolve disparities of findings in the field regarding the specific role of Dop1R2 in memory, offering a potent tool to generate mutants and addressing systematically their effects on different types of memory. Their results support the role of this receptor in the expression of long-term memories, however in the experiments performed here do not address temporal resolution of the genetic manipulations that could bring light into the mechanisms of action of Dop1R2 in memory. Several hypotheses have been proposed, from stabilization of memory, effects on forgetting, or integration of sequences of events (sensory experiences and dopamine release).

      We thank you for this comment. We agree that it would be interesting to dissect the memory stages by knocking out the receptor selectively in some of them (encoding, consolidation, retrieval). However, our tool irreversibly flips out Dop1R2 preventing us from investigating the receptor’s role in retrieval. Our results show that the receptor is dispensable for STM formation (Figure 2, Figure 4 Supplement 3), suggesting that it is not involved in encoding new information. On the other hand, it is instead involved in consolidation and/or retrieval of long-term and middle-term memories (Figure 3, Figure 4, Figure 5B).

      Overall, the authors generated a very useful tool to study dopamine neuromodulation in any given circuit when used in combination with the powerful genetic toolkit available in Drosophila. The reports in this paper confirmed a previously described role of Dop1R2 in the expression of aversive and appetitive LTM and mapped these effects to two specific types of memory neurons in the fly brain, previously implicated in the expression and consolidation of long-term associative memories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On the first view, the results shown here are different from studies published earlier, while in the same line with others (e.g. Sun et al, for appetitive 24h memories). For example, Berry et al showed that the loss of dop1R2 impairs immediate memory, while memory scores are enhanced 3h, 6h, and 24h after training. Further, they showed data that shock avoidance, at least for higher shock intensities, is reduced in mutant (damb) flies. All in all, this favors how important it is to improve the genetic tools for tissue-specific manipulation. Despite the authors nicely discussing their data with respect to the previous studies, I wondered whether it would be suitable to use the new tool and knock out dop1R2 panneuronally to see whether the obtained data match the results published by Berry et al.. Further, as stated in line 105ff: "As these studies used different learning assays - aversive and appetitive respectively as well as different methods, it is unclear if Dop1R2 has different functions for the different reinforcement stimulus" I wondered why the authors tested aversive and appetitive learning for STM and 2h memory, but only appetitive memory for 24h.

      Thank you for this comment. To that extent, as mentioned above in response to reviewer #2, we included in the results the aversive LTM experiment (Figure 4). Moreover, we performed experiments along the line of Berry et al. using our tool as shown in Figure 4 S1. Our results support that Dop1R2 is required for LTM, rather than to promote forgetting.

      (2) Line 165ff: I can´t find any of the supplementary data mentioned here. Please add the corresponding figures.

      Thank you for pointing this out. In that line we don’t refer to any supplementary data, but to the Figure 1F, showing the absence of the HA-tag in our MB knock-out line. We have clarified this in the text (lines 151-153)

      (3) I can't imagine that the scale bar in Figure 1D-F is correct. I would also like to suggest to show a more detailed analysis of the expression pattern. For example, both anterior and posterior views would be appropriate, perhaps including the VNC. This would allow the expression pattern obtained with this novel tool to be better compared with previously published results. Also, in relation to my comment above (1), it may help to understand the functional differences with previous studies, especially as the authors themselves state that the receptor is "mainly" expressed in the mushroom body (line 99). It would be interesting to see where else it is expressed (if so). This would also be interesting for the panneuronal knockdown experiment suggested under (1). If the receptor is indeed expressed outside the mushroom body, this may explain the differences to Berry et al.

      Thank you for noting this, there was indeed a mistake in the scale bar which we now fixed. Since with our HA-tag immunostaining we could not detect any noticeable signal outside of the MB, we decided to analyze previously existing single cell transcriptomics data that showed expression of the receptor in 7.99% of cells in the VNC and in 13.8% of cells outside the MB (lines 98-100) confirming its sparse expression in the nervous system. The lack of detection of these cells is likely due to the sparse and low expression of the protein. The HA-tag allows to detect the endogenous level of the locus (it is possible that a Gal4/UAS amplification of the signal might allow to detect these cells).

      Regarding the panneuronal knockout, we decided to try to replicate the experiment shown in Berry et al. in Figure 4 S1 and found that Dop1R2 is required for LTM.

      (4) Related to learning data shown in Figures 2-4, the authors should show statistical differences between all groups obtained in the ANOVA + PostHoc tests. Currently, only an asterisk is placed above the experimental group, which does not adequately reflect the statistical differences between the groups. In addition, I would like to suggest adding statistical tests to the chance level as it may be interesting to know whether, for example, scores of knockout flies in 3C and 3D are different from the chance level.

      Many thanks for this correction, we agree with the fact that the way significance scores were shown was not informative enough. We fixed the point by now showing significance between all the control groups and the experimental ones. We also inserted the chance level results in the figure legends.

      (5) Unfortunately, the manuscript has some typing errors, so I would like to ask the authors to check the manuscript again carefully.

      Some Examples:

      Line 31: the the

      Line 56: G-Protein

      Line 64: c-AMP

      Line 68: Dopamine

      Line 70: G-Protein (It alternates between G-protein and G-Protein)

      Line 76: References are formatted incorrectly

      Line 126: Ha-Tag (It alternates between Ha and HA)

      Line 248: missing space before the bracket...is often found

      Thank you for noticing these errors, we have now corrected the spelling throughout the manuscript.

      (6) In the figures the axes are labelled Preference Index (Pref"I"). In the methods, however, the calculation formula is defined as "PREF".

      We thank you for drawing attention to this. To avoid confusion, we changed the definition in the methods section so that it could be clear and coherent (“Memory tests” paragraph in the methods section).

      “PREF = ((N<sub>arm1</sub> - N<sub>arm2</sub>) 100) / N<sub>total</sub> the two preference indices were calculated from the two reciprocal experiments. The average of these two PREFs gives a learning index (LI). LI = (PREF<sub>1</sub> + PREF<sub>2</sub>) / 2.

      In case of all Long-term Aversive memory experiments, Y-Maze protocol was adapted to test flies 24 hours post training. Testing using the Y-Maze was done following the protocol as described in (Mohandasan et al., 2022) where flies were loaded at the bottom of 20-minutes odorized 3D-printed Y-Mazes from where they would climb up to a choice point and choose between the two odors. The learning index was then calculated after counting the flies in each odorized vial as follows: LI = ((N<sub>CS-</sub> - N<sub>CS+</sub>) 100) / N<sub>total</sub>. Where NCS- and NCS+ are the number of flies that were found trapped in the untrained and trained odor tube respectively.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figures 2 and 3, the legends running two different subfigures is confusing. Would be helpful to find a different way to present.

      Thank you for your suggestion. We modified how we present legends, placing them vertically so that it is clearer.

      (2) Use additional drivers to verify middle and long-term memory phenotypes.

      We agree that it would be interesting to see the role of Dop1R2 in other neurons. To that extent, we looked at long term aversive memory in flies where the receptor was panneuronaly flipped out, and did not find evidence that suggested involvement of Dop1R2 in memory processes outside the MB. (Figure 4 S1)

      (3) Additional discussion of genetic background for fly lines would be helpful.

      Thank you for your advice. We have mentioned the genetic background of flies in the key resources table of the methods sections. Additionally, we also included further explanation on how the lines were created and their genetic background (see “Fly Husbandry” paragraph in the methods section).

      “UAS-flp;;Dop1R2 cko flies and Gal4;Dop1R2<sup>cko</sup> flies were crossed back with ;;Dop<sup>cko</sup> flies to obtain appropriate genetic controls which were heterozygous for UAS and Gal4 but not Dop1R2<sup>cko</sup>.”

      Reviewer #3 (Recommendations For The Authors):

      Line 109 states that to resolve the problem a tool is developed to knock down Dop1R2 in s spatial and temporal specific manner- while I agree that this is within the potential of the tool, there is no temporal control of the flipase action in this study; at least I cannot find references to the use of target/gene switch to control stages of development or different memory phases. However the version available for download is missing supplementary information, so I did not have access to supplementary figures and tables.

      Thank you for the comment, as mentioned before it would be great to be able to dissect the memory phases. We show in lines 232 – 250 and Figure 4 S3 that the temporally restricted flip-out to the post-eclosion life stage gave us coherent results with the previous findings, ruling out potential developmental defects.

      In relation to my comment on the possible developmental effects of the loss of the gene, Figure 1F could showcase an underdeveloped g lobe when looking at the lobe profiles. I understand this is not within the scope of the figure, but maybe a different z projection can be provided to confirm there are no obvious anatomical alterations due to the loss of the receptor.

      We understand the doubt about the correct development of the MB and we thank you for your insightful comment. To that extent we decided to perform a FasII immunostaining that could show us the MB in the different lines (Figure 4 S2) and it appears that there are no notable differences in the lobes development in our knockout line.

      It seems that the obvious missing piece of the puzzle would be to address the effects of knocking out Dop1R2 in aversive LTM. The idea of systematically addressing different types of memory at different time points and in different KCs is the most attractive aspect of this study beyond the technical sophistication, and it feels that the aim of the study is not delivered without that component.

      We agree and we thank you for the clarification. As mentioned above in response to Reviewer #2, we decided to test aversive LTM as described in lines –208-228, Figure 4, Figure 4 S1.

      Some statements of the discussion seem too vague, and I think could benefit from editing:

      Line 284 "however other receptors could use Gq and mediate forgetting"- does this refer to other dopamine receptors? Other neuromodulators? Examples?

      Thank you for pointing this out. We Agree and therefore decided to omit this line.

      Line 289 "using a space training protocol and a Dop1R2 line" - this refers to RNAi lines, but it should be stated clearly.

      That is correct, we thank you for bringing attention to this and clarified it in the manuscript.

      –Lines 329-330

      “Interestingly, using a spaced training protocol and a Dop1R2 RNAi knockout line another study showed impaired LTM (Placais et al., 2017).”

      The paragraph starting in line 305 could be re-written to improve clarity and flow. Some statements seem disconnected and require specific citations. For example "In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways and the internal state of the animal...". This is not accurate. Berry et al 2012 report enhanced LTM performance in dop1R2 mutants whereas Plaçais et al 2017 report LTM defects in Dop1R2 knock-downs, but these different findings do not seem to rely on different internal states or signaling pathways. Maybe further elaboration can help the reader understand this speculation.

      We agree and we thank you for this advice. We decided to add additional details and citations to validate our speculation

      Lines 350-353

      “In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways. The signaling pathway that is activated further depends on the available pool of secondary messengers in the cell (Hermans, 2003) which may be regulated by the internal state of the animal.”

      "...for reward memory formation, loss of Dop1R2 seems to impair memory", this seems redundant at this point, as it has been discussed in detail, however, citations should be provided in any case (Musso 2015, Sun 2020)

      Thank you for noting this. We recognize the redundancy and decided to exclude the line.

      Finally, it would be useful to additionally refer to the anatomical terminology when introducing neuron names; for example MBON MVP2 (MBON-g1pedc>a/b), etc.

      Thank you for this suggestion. We understand the importance of anatomical terminologies for the neurons. Therefore, we included them when we introduce neurons in the paper.

      We thank you for your observations. We recognize their value, so we have made appropriate changes in the discussion to sound less vague and more comprehensive.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Using highly specific antibody reagents for biological research is of prime importance. In the past few years, novel approaches have been proposed to gain easier access to such reagents. This manuscript describes an important step forward toward the rapid and widespread isolation of antibody reagents. Via the refinement and improvement of previous approaches, the Perrimon lab describes a novel phage-displayed synthetic library for nanobody isolation. They used the library to isolate nanobodies targeting Drosophila secreted proteins. They used these nanobodies in immunostainings and immunoblottings, as well as in tissue immunostainings and live cell assays (by tethering the antigens on the cell surface).

      Since the library is made freely available, it will contribute to gaining access to better research reagents for non-profit use, an important step towards the democratisation of science.

      Strengths:

      (1) New design for a phage-displayed library of high content.

      (2) Isolation of valuble novel tools.

      (3) Detailed description of the methods such that they can be used by many other labs.

      We are grateful for these supportive comments.

      Weaknesses:

      My comments largely concentrate on the representation of the data in the different Figures.

      We have made adjustments according to the reviewer’s recommendations.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors propose an alternative platform for nanobody discovery using a phage-displayed synthetic library. The authors relied on DNA templates originally created by McMahon et al. (2018) to build the yeast-displayed synthetic library. To validate their platform, the authors screened for nanobodies against 8 Drosophila secreted proteins. Nanobody screening has been performed with phage-displayed nanobody libraries followed by an enzyme-linked immunosorbent assay (ELISA) to validate positive hits. Nanobodies with higher affinity have been tested for immunostaining and immunoblotting applications using Drosophila adult guts and hemolymph, respectively.

      Strengths:

      The authors presented a detailed protocol with various and complementary approaches to select nanobodies and test their application for immunostaining and immunoblotting experiments. Data are convincing and the manuscript is well-written, clear, and easy to read.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      On the eight Drosophila secreted proteins selected to screen for nanobodies, the authors failed to identify nanobodies for three of them. While the authors mentioned potential improvements of the protocol in the discussion, none of them have been tested in this manuscript.

      We prepared all eight antigens by single-step IgG purification (see Materials and Methods) without additional biophysical quality control (e.g., size-exclusion chromatography). Consequently, we cannot definitively determine whether the three “no-binder” cases resulted from the aggregation or misfolding of the antigens, versus gaps in our naive library’s sequence space. While approaches such as additional purification steps or affinity maturation of weak binders would likely rescue these difficult targets, comprehensive pipeline optimization is beyond the scope of establishing and validating the phage-displayed nanobody platform. We have clarified this limitation and suggested these strategies in third paragraph of the Discussion.

      The same comment applies to the experiments using membrane-tethered forms of the antigens to test the affinity of nanobodies identified by ELISA. Many nanobodies fail to recognize the antigens. While authors suggested a low affinity of these nanobodies for their antigens, this hypothesis has not been tested in the manuscript.

      We observed that several nanobodies with strong ELISA signals showed reduced binding to membrane-displayed antigens. This discrepancy may result from low affinity of the nanobodies or differences in post-translational modifications (e.g., glycosylation) and antigen context between secreted IgG-fusion proteins (used for panning/ELISA) and GPI- or mCD8-anchored proteins. In an ongoing work, we have performed affinity maturation of the nanobodies and successfully increased the affinity toward the target antigen. These results will be reported separately.

      Improving the protocol at each step for nanobody selection would greatly increase the success rate for the discovery of nanobodies with high affinity.

      We fully agree that systematic optimization—from antigen preparation (e.g., additional purification steps) through screening conditions (e.g., buffer composition, additional affinity-maturation steps)—could substantially increase the success rate and nanobody affinity. These represent important directions for future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 3. The merge of two GFP channels does not make much sense. Can the authors not use artificial colours? And show the panels at higher resolution, such that a viewer can really see and judge what they are seeing? The same comments apply to all Supplementary Figures.

      We appreciate the reviewer’s comment. In the revised Figure 3, we have replaced the cyan/green overlay with red/green overlay and used enlarged pictures so that GFP-positive cells and corresponding nanobody staining are clearly visible. We applied the same layout to all relevant Supplementary Figures.

      (2) Figure 4. Also, in this Figure, it is not really possible to see what the authors say one should see. The resolution should be higher, and arrows or arrowheads should point to important structures.

      We appreciate the reviewer’s comment. In the revised Figure 4A, we have added arrows to point to the immunostaining signal in cells with smaller nuclei and added inset panels to show a closer view of representative NbMip-4G staining.

      Reviewer #2 (Recommendations for the authors):

      (1) Images are sometimes quite small and difficult to interpret. For example, Figures S2C-D.

      We thank the reviewer for this suggestion. In the revised figures, we have replaced the cyan/green overlay with red/green overlay and used enlarged pictures that clearly show GFP-positive cells alongside their corresponding nanobody staining.

      (2) Supplemental figures are not always cited in the text.

      Thank you for the comment. To eliminate this misunderstanding, we have updated the Nesfatin1 nanobody screen data as Supplementary Figure 1 and Mip nanobody screen data as Supplementary Figure 2. We have made the corresponding changes in the Results section.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

      We plan to provide more direct evidence that matriptase activation is regulated by the Rho-ROCK pathway, utilizing antibodies that specifically recognize the activated form of matriptase.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

      As noted, while we have demonstrated that Rho activation is both necessary and sufficient to induce matriptase activation, the precise mechanism by which Rho mediates this activation remains unclear. As discussed in the manuscript, several potential molecular mechanisms could underlie the contribution of Rho to matriptase activation. As part of our future work, we intend to systematically investigate each of these mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca2+-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca2+-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca2+-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca2+ activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)" The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel." For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter" For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Deng et al reports single cell expression analysis of developing mouse hearts and examines the requirements for cardiac fibroblasts in heart maturation. The work includes extensive gene expression profiling and bioinformatic analysis. The prenatal fibroblast ablation studies show new information on the requirement of these cells on heart maturation before birth.

      The strengths of the manuscript are the new single cell datasets and comprehensive approach to ablating cardiac fibroblasts in pre and postnatal development in mice. Extensive data are presented on mouse embryo fibroblast diversity and morphology in response to fibroblast ablation. Histological data support localization of major cardiac cell types and effects of fibroblast ablation on cardiac gene expression at different times of development.

      A weakness of the study is that the major conclusions regarding collagen signaling and heart maturation are based on gene expression patterns and are not functionally validated.

      Reviewer #2 (Public review):

      This study aims to elucidate the role of fibroblasts in regulating myocardium and vascular development through signaling to cardiomyocytes and endothelial cells. This focus is significant, given that fibroblasts, cardiomyocytes, and vascular endothelial cells are the three primary cell types in the heart. The authors employed a Pdgfra-CreER-controlled diphtheria toxin A (DTA) system to ablate fibroblasts at various embryonic and postnatal stages, characterizing the resulting cardiac defects, particularly in myocardium and vasculature development. Single-cell RNA sequencing (scRNA-seq) analysis of the ablated hearts identified collagen as a crucial signaling molecule from fibroblasts that influences the development of cardiomyocytes and vascular endothelial cells.

      This is an interesting manuscript; however, there are several major issues, including an over-reliance on the scRNA-seq data, which shows inconsistencies between replicates.

      We thank the reviewer for carefully reading our revised manuscript. All of the questions listed below were raised in the previous round and have been addressed in the current revision. As noted in the “Recommendations for the Authors” section, the reviewer has no additional comments at this time.

      Some of the major issues are described below.

      (1) The CD31 immunostaining data (Figure 3B-G) indicate a reduction in endothelial cell numbers following fibroblast deletion using PdgfraCreER+/-; RosaDTA+/- mice. However, the scRNA-seq data show no percentage change in the endothelial cell population (Figure 4D). Furthermore, while the percentage of Vas_ECs decreased in ablated samples at E16.5, the results at E18.5 were inconsistent, showing an increase in one replicate and a decrease in another, raising concerns about the reliability of the RNA-seq findings.

      (2) Similarly, while the percentage of Ven_CMs increased at E18.5, it exhibited differing trends at E16.5 (Fig. 4E), further highlighting the inconsistency of the scRNA-seq analysis with the other data.

      (3) Furthermore, the authors noted that the ablated samples had slightly higher percentages of cardiomyocytes in the G1 phase compared to controls (Fig. 4H, S11D), which aligns with the enrichment of pathways related to heart development, sarcomere organization, heart tube morphogenesis, and cell proliferation. However, it is unclear how this correlates with heart development, given that the hearts of ablated mice are significantly smaller than those of controls (Figure 3E). Additionally, the heart sections from ablated samples used for CD31/DAPI staining in Figure 3F appear much larger than those of the controls, raising further inconsistencies in the manuscript.

      (4) The manuscript relies heavily on the scRNA-seq dataset, which shows inconsistencies between the two replicates. Furthermore, the morphological and histological analyses do not align with the scRNA-seq findings.

      (5) There is a lack of mechanistic insight into how collagen, as a key signaling molecule from fibroblasts, affects the development of cardiomyocytes and vascular endothelial cells.

      (6) In Figure 1B, Col1a1 expression is observed in the epicardial cells (Figure 1A, E11.5), but this is not represented in the accompanying cartoon.

      (7) Do the PdgfraCreER+/-; RosaDTA+/- mice survive after birth when induced at E15.5, and do they exhibit any cardiac defects?

      Reviewer #3 (Public review):

      Summary:

      The authors investigated fibroblasts' communication with key cell types in developing and neonatal hearts, with focus on critical roles of fibroblast-cardiomyocyte and fibroblast-endothelial cells network in cardiac morphogenesis. They tried to map the spatial distribution of these cell types and reported the major pathways and signaling molecules driving the communication. They also used Cre-DTA system to ablate Pdgfra labeled cells and observed myocardial and endothelial cell defects at development. They screened the pathways and genes using sequencing data of ablated heart. Lastly they reported a compensatory collagen expression in long term ablated neonate heart. Overall, this study provides us with important insight on fibroblasts' roles in cardiac development and will be a powerful resource for collagens and ECM focused research.

      Strengths:

      The authors utilized good analyzing tools to investigate on multiple database of single cell sequencing and Multi-seq. They identified significant pathways, cellular and molecular interactions of fibroblasts. Additionally, they compared some of their analytic findings with human database, and identified several groups of ECM genes with varying roles in mice.

      Weaknesses:

      This study is majorly based on sequencing data analysis. At the bench, they used very strident technique to study fibroblast functions by ablating one of the major cell population of heart. Also, experimental validation of their analyzed downstream pathways will be required eventually.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Most of my comments have been adequately addressed. Additional comments on new data in the revised manuscript are below.

      (1) In the new figure S11, it is not really possible to draw major conclusions on mitral valve morphology and maturation since the planes of sections to not seem comparable. Observations regarding attachment to the papillary muscle might be dependent on the particular section being evaluated. However, it is useful to see that the valves are not severely affected in the ablated animals.

      We appreciate the reviewer’s comment and agree with the reviewer’s observation. Accordingly, we have updated the manuscript by removing the original conclusion-related statement and instead highlighting that the valves were not severely affected in the ablated animals (page 6).

      (2) In the last supplemental figure S19, it is not possible to determine if results are or are not statistically significant for n=2 as shown for FS and EF for the ablated animals and controls. The text says that there is a trend of improved heart function, but evaluation of additional animals is needed to support this conclusion.

      We thank the reviewer for the comment and agree that a sample size of n = 2 is too small to draw meaningful conclusions. As previously suggested by the reviewer, we have removed this result from the manuscript (page 10).

      Reviewer #2 (Recommendations for the authors):

      The manuscript has greatly improved following the revision, and I have no additional comments to offer.

      Thanks!

      Reviewer #3 (Recommendations for the authors):

      Authors did a good job addressing questions asked at first review. However, I have some minor concerns.

      (1) The paper notes that collagen signaling is observed in FB-VasEC in humans, but not in FB-VenCM, unlike mice. Did authors analyze predictive ligand receptor interaction as they did with control and ablated mice heart? This could add valuable new insights that how FB regulate ventricular CM in human heart.

      Thank you. We have analyzed the predicted ligand-receptor interactions between Fb and Ven_CM, as well as between Fb and Vas_EC, using human scRNA-seq data. The results are provided as a supplemental figure (Fig. S8C).

      (2) The authors provided data on Defect in CD31 expression in several models. Did they observed any other phenotypes associated with defective endothelial or vascular system? Such as, blood accumulation in pericardium, larger/smaller capillaries? Did they also examined percentage of Cdh5+ cells?

      We thank the reviewer for the questions. We did not observe clear evidence of blood accumulation in the pericardium of the ablated hearts, as shown in figure 3B, 3E, 6B, and 6F. Additionally, we did not perform Cdh5 staining in either the control or ablated hearts.

      (3) Please mention the sample age of Figure 2A-C.

      These are single-cell mRNA sequencing data from CD1 mice across 18 developmental stages, ranging from E9.5 to P9. We have added this information to the manuscript (page 4).

      (4) Please follow the same style to describe X axis in graphs in Figure 3D (and all similar graphs in manuscript) as followed in 3G.

      Thank you. We assume the reviewer was referring to the descriptions in the relevant figure legends. We have updated the legend for Figure 3D to ensure consistency with the description provided for Figure 3G (page 15).

      (5) It is important to provide echocardiographic M mode images with a comparable number of cardiac cycles in control and ablated (Fig. 6H).

      We thank the reviewer for the comment. As explained in our previous response, the echocardiographic data for both control and mutant mice were collected in conscious animals. The differences in their cardiac cycles reflect variations in heart rate, which represent a disease phenotype and cannot be altered. Therefore, we are unable to provide M-mode images with a similar number of cardiac cycles for control and ablated mice.

      (6) In the long-term neonatal ablation experiments, collagen expressions return to normal. The manuscript attributes this to possible "compensatory expression," Do they have any thoughts how this is regulated? Are other cell types stepping in, or are surviving FBs proliferating?

      We thank the reviewer for the question. As suggested, the compensatory collagen expression could be driven by surviving fibroblasts or other cell types. Since we currently lack evidence to exclude either possibility, we believe both could be contributing factors.

      (7) While collagen is shown to be a dominant signaling molecule, its centrality is inferred primarily from scRNAseq and ligand-receptor predictions. Did authors try any functional rescue experiment (e.g., exogenous collagen supplementation or receptor blockade) to directly validate this pathway's role in vivo?

      We thank the reviewer for the comment. As noted in our previous revision in response to similar questions from the other two reviewers, we agree that these rescue experiments are of interest but are beyond the scope of the current study. We plan to pursue these investigations in future work and share our findings when available.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Rho-ROCK liberates sequestered claudin for rapid de novo tight junction formation" by Cho and colleagues investigates de novo tight junction formation during the differentiation of immortalized human HaCaT keratinocytes to granular-like cells, as well as during epithelial remodeling that occurs upon the apoptotic of individual cells in confluent monolayers of the representative epithelial cell line EpH4. The authors demonstrate the involvement of Rho-ROCK with well-conducted experiments and convincing images. Moreover, they unravel the underlying molecular mechanism, with Rho-ROCK activity activating the transmembrane serine protease Matriptase, which in turn leads to the cleavage of EpCAM and TROP2, respectively, releasing Claudins from EpCAM/TROP2/Claudin complexes at the cell membrane to become available for polymerization and de novo tight junction formation. These functional studies in the two different cell culture systems are complemented by localization studies of the according proteins in the stratified mouse epidermis in vivo.

      In total, these are new and very intriguing and interesting findings that add important new insights into the molecular mechanisms of tight junction formation, identifying Matriptase as the "missing link" in the cascade of formerly described regulators. The involvement of TROP2/EpCAM/Claudin has been reported recently (Szabo et al., Biol. Open 2022; Bugge lab), and Matriptase had been formerly described to be required for in tight junction formation as well, again from the Bugge lab. Yet, the functional correlation/epistasis between them, and their relation to Rho signaling, had not been known thus far.

      However, experiments addressing the role of Matriptase require a little more work.

      Strengths:

      Convincing functional studies in two different cell culture systems, complemented by supporting protein localization studies in vivo. The manuscript is clearly written and most data are convincingly demonstrated, with beautiful images and movies.

      Weaknesses:

      The central finding that Rho signaling leads to increased Matriptase activity needs to be more rigorously demonstrated (e.g. western blot specifically detecting the activated version or distinguishing between the full-length/inactive and processed/active version).

      First, we thank the reviewer for their fair evaluation of our manuscript and for providing constructive feedback. Regarding the detection of matriptase activation—which Reviewer 1 identified as a weakness—we fully agree that direct validation is crucial. Therefore, in this revision we have carried out additional experiments using the M69 antibody, which specifically recognizes the activated form of matriptase. Details of these new experiments are provided in our point-by-point responses below.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate how epithelia maintain intercellular barrier function despite and during cellular rearrangements upon e.g. apoptotic extrusion in simple epithelia or regenerative turnover in stratified epithelia like this epidermis. A fundamental question in epithelial biology. Previous literature has shown that Rho-mediated local regulation of actomyosin is essential not only for cellular rearrangement itself but also for directly controlling tight junction barrier function. The molecular mechanics however remained unclear. Here the authors use extensive fluorescent imaging of fixed and live cells together with genetic and drug-mediated interference to show that Rho activation is required and sufficient to form novo tight junctional strands at intercellular contacts in epidermal keratinocytes (HaCat) and mammary epithelial cells. After having confirmed previous literature they then show that Rho activation activates the transmembrane protease Matriptase which cleaves EpCAM and TROP2, two claudin-binding transmembrane proteins, to release claudins and enable claudin strand formation and therefore tight junction barrier function.

      Strengths:

      The presented mechanism is shown to be relevant for epithelial barriers being conserved in simple and stratifying epithelial cells and mainly differs due to tissue-specific expression of EpCAM and TROP2. The authors present careful state-of-the-art imaging and logical experiments that convincingly support the statements and conclusion. The manuscript is well-written and easy to follow.

      Weaknesses:

      Whereas the in vitro evidence of the presented mechanism is strongly supported by the data, the in vivo confirmation is mostly based on the predicted distribution of TROP2. Whereas the causality of Rho-mediated Matriptase activation has been nicely demonstrated it remains unclear how Rho activates Matriptase.

      Thank you for your valuable feedback on our manuscript. As Reviewer 2 points out, the precise mechanism by which the Rho/ROCK pathway activates matriptase remains unclear. We have discussed the possible molecular mechanisms in the Discussion section. Elucidating the detailed mechanism of matriptase activation will be the focus of our future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1-1 - Matriptase activation by Rho: The authors show activation of Matriptase in western blots by the simple reduction of (full-length?) protein level in Figures 5 and 7. Most publications however show activated Matriptase either by antibodies detecting specifically the active form (including the publication referenced in this manuscript), or the appearance of the activated form next to the inactive form (based on different molecular weights). Therefore, it is not completely clear whether the treatment with Rho activators (Figure 5) results in an overall decrease of Matriptase, or really in an increase in the activated form. Therefore, the authors should show the actual increase of the active form. As a control, the impact of camostat treatment and overexpression of Hai1 on the active form of Matriptase could be included. It also should be indicated in the figure legend how long cells had been treated with the drugs before being subjected to lysis. Moreover, the western blots need to be quantified.

      We performed a more rigorous analysis using the M69 antibody, which specifically recognizes the activated form of matriptase and has been widely used in previous studies(e.g. Benaud et al., 2001; Hung et al., 2004; Wang et al., 2009). We likewise confirmed a significant increase in M69 signals by both western blotting and immunostaining from samples in which matriptase was activated by acid medium treatment (Figure 5A). Crucially, we also observed matriptase activation with the M69 antibody both in Rho/ROCK activator-treated cells (Figure 5A) and in differentiated granular-layer-like cells (Figures 7A and 7D). These findings strongly support the conclusion that matriptase is activated downstream of the Rho/ROCK pathway.

      Comment 1-2 - Based on their results, the authors conclude that Matriptase cleaves TROP2 in the SG2 layer of the epidermis, which is a little contradictory to former studies, which have shown Matriptase to be most prominently expressed and active in the basal layer and only little in the spinous layer (e.g Chen et al., Matriptase regulates proliferation and early, but not terminal, differentiation of human keratinocytes. J Invest Dermatol.2013). In this light, one could also argue that inhibiting Matriptase "simply" reduces epidermal differentiation. Can other differentiation markers be tested to rule that the effects on tight junctions are secondary consequences of interferences with earlier / more global steps of keratinocyte differentiation?

      As the reviewer noted, previous studies have demonstrated that matriptase is essential for keratinocyte differentiation, and that it cleaves substrates beyond EpCAM and TROP2—any of which could potentially influence the differentiation process. To test this possibility, we chose to monitor maturation of adherens junction (AJ) as an indicator of keratinocyte differentiation into granular-layer cells. Prior work has shown that during differentiation into granular-layer cells, AJs develop and experience increased intercellular mechanical tension, and that this rise in mechanical tension at AJs is critical for subsequent TJ formation (Rübsam et al., 2017). To assess AJ tension, we stained with the α-18 monoclonal antibody, which specifically recognizes the tension-dependent conformational change of α-catenin, a core AJ component. In control cells, differentiation into granular-layer like cells led to a marked increase in α-18 signal at cell–cell adhesion sites. Importantly, when HaCaT cells were treated with Camostat to inhibit matriptase and then induced to differentiate, we observed an equivalent increase in α-18 signal at AJs (Figure 7F). However, we did not detect claudin enrichment at cell-cell contacts under these conditions (Figures 7F and 7H). These results suggest that matriptase inhibition does not impair AJ maturation during granular-layer differentiation, but does profoundly disrupt TJ formation. While we cannot rule out the possibility that matriptase acts more broadly from these results, we judged that a comprehensive substrate survey lies outside the scope of the present manuscript.

      Comment 1-3 - In addition, as in Figure 5, full-length levels of Matriptase in Figure 7A need to be complemented by the active version to demonstrate more convincingly that TROP2 processing coincides with (and is most likely caused by) increased Matriptase activation. In the quantification in 7B, levels actually go up again after 2 and 4 hours. How is that explained, and what would this mean with respect to tight junction formation seen at 24 h of differentiation? The TROP2 cleavage shown in Figure 7A should be quantified.

      This comment is related to Comment 1-1. Using the M69 antibody, which specifically recognizes the activated matriptase, we directly demonstrated that matriptase activation occurs during the differentiation of granular layer-like cells (Figures 7A and 7D). Furthermore, we performed quantitative analysis of TROP2 cleavage and found that, compared with undifferentiated cells, differentiation into granular-layer like cells was accompanied by an increase in the cleaved TROP2 fragments (Figures 7A and 7B).

      Minor points:

      Comment 1-4 - Figure 1B and C: Including orthogonal views would be a nice add-on to appreciate the findings.

      In the revised version, we have added the corresponding orthogonal views to Figure 1B and Figure 1C.

      Comment 1-5 - Figure 2D: last row: indication of orthogonal view.

      We stated that the bottom panels are orthogonal views in the figure legend of Figure 2D.

      Comment 1-6 - Figure 3A: quantification is missing. GST-Rhotekin assay is missing in methods.

      In the revised manuscript, we have added quantitative analysis for Figure 3A. We have also supplemented the Materials and Methods section with detailed information on the GST–Rhotekin assay used to quantify levels of active RhoA.

      Comment 1-7 - Figure 4H: quantification of the Western blot is missing.

      In the revised manuscript, we have added quantitative analysis for Figure 4H as Figure 4I.

      Comment 1-8 - Figure 5 and 6: Quantifications of Western blots are missing.

      In the revised manuscript, we have added quantitative analyses for Figure 5D as Figure 5F and for Figure 6A as Figure 6B.

      Comment 1-9 - Figure 7C: quantification of the Western blot is missing.

      Figure 7C does not present western blotting data. For the other western blotting results, we have added quantitative analyses as suggested by Reviewer 1.

      Comment 1-10 - Figure 8I: Including Hai1 overexpression would be good for a complete picture.

      Following Reviewer 1’s suggestion, we have added staining data for Hai1-overexpressing cells to Figure 8J.

      Comment 1-11 - Line 377: The authors say they found Matriptase always present in lateral membranes. I did not find evidence for this in the manuscript.

      Previous studies have demonstrated that in polarized epithelial cells, matriptase is localized to the basolateral membrane below TJs (Buzza et al., 2010; Wang et al., 2009). We also found that matriptase consistently localizes to the basolateral membrane but more crucially that it becomes activated there during differentiation into granular layer cells. We added these new data as Figures 7C-7E in the revised manuscript. These findings suggest that matriptase activation occurs without a change in its subcellular localization.

      Comment 1-12 - Line 381: should most likely say: and ADAM17 but it is not known whether...

      We corrected the sentence in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The authors have added a significant number of quantifications verifying their observations, which was a major comment in a previous version of the manuscript and thus I have only a few minor comments which should be addressed.

      Comment 2-1 - It is not required to have scale bars in every image of a panel if the same scale is used.

      Unnecessary scale bars were removed. Specifically, scale bars were removed from Figure 1B, 1C, 1F, 8F, 8G, and 8H.

      Comment 2-2 - Throughout all figures: Please state for non-quantified images whether this is a representative example and for how many technical or biological repeats this is representative. Also for "N" number, state what the N stands for and if this is what the dots in the graph represent. Are these the number of junctions or technical, experimental or biological repeats?

      In the revised manuscript, we have added the number of independent experiments and corresponding “N” values to the Quantification and Statistical Analysis subsection of the Materials and Methods.

      Comment 2-3 - Some Zooms have a scale bar (6d), and some do not (e.g. 5b).

      The scale bar was removed from the magnified image in Figure 6D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Strengths:

      The subject is of importance.

      Weaknesses:

      The conclusions are too strong for the presented data. The lack of statistical analysis makes this paper incomplete. The novelty of the findings is not clear.

      We have strengthened the data analysis by including appropriate statistical tests to support our conclusions more convincingly. Additionally, we have refined the description of the research background to better emphasize the novelty and significance of our findings. Please see the detailed responses below for further information.

      Major issues:

      (1) The novelty is in question since in the Abstract the authors highlight their main finding, which is that both the chemotaxis complex and the flagella localize to the same pole, as surprising. However, in the Introduction they state that "pathway-related receptors that mediate chemotaxis, as well as the flagellum are localized at the same cell pole17,18". I am not a pseudomonas researcher and from my short glance at these references, I could not tell whether they report colocalization of the two structures to the same pole. However, I trust the authors that they know the literature on the localization of the chemotaxis complex and flagella in their organism. See also major issue number 5 on the novelty regarding the involvement of c-di-GMP.

      We thank the reviewer for this valuable comment and appreciate the opportunity to clarify our statements.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      (2) Statistics for the microscopy images, on which most conclusions in this manuscript are based, are completely missing. Given that most micrographs present one or very few cells, together with the fact that almost all conclusions depend on whether certain macromolecules are at one or two poles and whether different complexes are in the same pole, proper statistics, based on hundreds of cells in several fields, are absolutely required. Without this information, the results are anecdotal and do not support the conclusions. Due to the importance of statistics for this manuscript, strict statistical tests should be used and reported. Moreover, representative large fields with many cells should be added as supportive information.

      We thank the reviewer for this important comment, which significantly improves the rigor and persuasiveness of our manuscript.

      For the colocalization analyses presented in Fig. 1D and Fig. 2B, we quantified 145 and 101 cells with fluorescently labeled flagella, respectively, and observed consistent colocalization of the chemoreceptor complexes and flagella in all examined cells (now added in the figure legends). Regarding the distribution patterns of chemoreceptors shown in Fig. 3A, we have now included comprehensive statistical analyses for both wild-type and mutant strains. For each strain, more than 300 cells were analyzed across at least three independent microscopic fields, providing robust statistical power (detailed data are presented in Fig. 3C).

      To further strengthen the evidence, statistical tests were applied to confirm the significance and reproducibility of our findings (Fig. 3C). In addition, representative large-field fluorescence images containing numerous cells have been added to the supplementary materials (Fig. S1 and Fig. S3).

      The problem is more pronounced when the authors make strong statements, as in lines 157-158: "The results revealed that the chemoreceptor arrays no longer grow robustly at the cell pole (Figure 2A)". Looking at the seven cells shown in Figure 2A, five of them show polar localization of the chemoreceptors. The question is then: what is the percentage of cells that show precise polar, near-polar, or mid cell localization (the three patterns shown here) in the mutant and in the wild type? Since I know that these three patterns can also be observed in WT cells, what counts is the difference, and whether it is statistically significant.

      We thank the reviewer for raising this important point. Following the reviewer's suggestion, we have now analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization. For each strain, more than 200 cells across three independent fields of view were quantified.

      Our statistical analysis shows that in the wild-type strain, approximately 98% of cells exhibit precise polar localization of the chemotaxis complex. In contrast, the ΔflhF mutant displays a clear shift in distribution, with about 5% of cells showing mid-cell localization and 9.5% showing near-polar localization. These differences demonstrate a significant alteration in the spatial pattern upon flhF deletion.

      We have revised the relevant text in lines 166-170 accordingly and included the detailed statistical data in the newly added Fig. S4.

      Even for the graphs shown in Figures 3C and 3D, where the proportion of cells with obvious chemoreceptor arrays and absolute fluorescence brightness of the chemosensory array are shown, respectively, the questions that arise are: for how many individual cells these values hold and what is the significance of the difference between each two strains?

      The number of cells analyzed for each strain is indicated in the original manuscript: 372 wild-type cells (line 123), 221 ΔflhF cells (line 172), 234 ΔfliG cells (line 197), 323 ΔfliF cells (line 200), 672 ΔflhFΔfliF cells (line 202), and 242 ΔmotAΔmotCD cells (line 207). For each strain, data were collected from three independent fields of view. We have now also provided the number of cells in Fig. 3 legend.

      We have now performed statistical comparisons using t-tests between strains. Notably, the measured values in Fig. 3C exhibit a clear, monotonic decrease with successive gene knockouts, supporting the robustness of the observed trend.

      Regarding the absolute fluorescence intensity shown in the original Fig. 3D, the mutants did not display consistent directional changes compared to the wild type. Reliable comparison of absolute fluorescence intensity requires consistent fluorescent protein maturation levels across strains. Given the likely variability in maturation levels between strains, we concluded that this data may not accurately reflect true differences in protein concentrations. Therefore, we have removed the fluorescence intensity graph from the revised manuscript to avoid potential misinterpretation.

      (3) The authors conclude that "Motor structural integrity is a prerequisite for chemoreceptor self-assembly" based on the reduction in cells with chemoreceptor clusters in mutants deleted for flagellar genes, despite the proper polar localization of the chemotaxis protein CheY. They show that the level of CheY in the WT and the mutant strains is similar, based on Western blot, which in my opinion is over-exposed. "To ascertain whether it is motor integrity rather than functionality that influences the efficiency of chemosensory array assembly", they constructed a mutant deleted for the flagella stator and found that the motor is stalled while CheY behaves like in WT cells. The authors further "quantified the proportion of cells with receptor clusters and the absolute fluorescence intensity of individual clusters (Figures 3C-D)". While Figure 3DC suggests that, indeed, the flagella mutants show fewer cells with a chemotaxis complex, Figure 3D suggests that the differences in fluorescence intensity are not statistically significant. Since it is obvious that the regulation of both structures' production and localization is codependent, I think that it takes more than a Western blot to make such a decision.

      We thank the reviewer for the suggestions. To further clarify that the assembly of flagellar motors and chemoreceptor clusters occurs in an orderly manner rather than being merely codependent, we performed additional experiments. Specifically, we constructed a ΔcheA mutant strain, in which chemoreceptor clusters fail to assemble. Using in vivo fluorescent labeling of flagellar filaments, we observed that the proportion of cells with flagellar filaments in the ΔcheA strain was comparable to that of the wild type (Fig. S5).

      In contrast, mutants lacking complete motor structures, such as ΔfliF and ΔfliG, showed a significant reduction in the proportion of cells with obvious receptor clusters (Fig. 3C). Based on these results, we conclude that the structural integrity of the flagellar motor is, to a certain extent, a prerequisite for the self-assembly of chemoreceptor clusters.

      Accordingly, we have revised the relevant statement in lines 213-217 of the manuscript to reflect this clarification.

      (4) I wonder why the authors chose to label CheY, which is the only component of the chemotaxis complex that shuttles back and forth to the base of the flagella. In any case, I think that they should strengthen their results by repeating some key experiments with labeled CheW or CheA.

      We thank the reviewer for this valuable suggestion. In our study, we initially focused on the positional relationship between chemoreceptor clusters and flagella, then investigated factors influencing cluster distribution and assembly efficiency. The physiological significance of motor and cluster co-localization was ultimately proposed with CheY as the starting point.

      Previous work by Harwood's group demonstrated that both CheY-YFP and CheA-GFP localize to the old poles of dividing Pseudomonas aeruginosa cells. Since our physiological hypothesis centers on CheY, we chose to label CheY-EYFP in our experiments.

      To further strengthen our conclusions, we constructed a plasmid expressing CheA-CFP and introduced it into the cheY-eyfp strain via electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP (Fig. S2), confirming that CheY-EYFP accurately marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 119-123) and added these data as Fig. S2.

      (5) The last section of the results is very problematic, regarding the rationale, the conclusions, and the novelty. As far as the rationale is concerned, I do not understand why the authors assume that "a spatial separation between the chemoreceptors and flagellar motors should not significantly impact the temporal comparison in bacterial chemotaxis". Is there any proof for that?

      We apologize for the lack of clarity in our original explanation. The rationale behind the statement was initially supported by comparing the timescales of CheY-P diffusion and temporal comparison in chemotaxis. Specifically, the diffusion time for CheY-P to traverse the entire length of a bacterial cell is approximately 100 ms (refs 39&40), whereas the timescale for bacterial chemotaxis temporal comparison is on the order of seconds (ref 41).

      To clarify and strengthen this argument, we have expanded the discussion as follows:

      The diffusion coefficient of CheY in bacterial cells is about 10 µm2/s, which corresponds to an estimated end-to-end diffusion time on the order of 100 ms (refs 40&41). If the chemotaxis complexes were randomly distributed rather than localized, diffusion times would be even shorter. In contrast, the timescale for the chemotaxis temporal comparison is on the order of seconds (ref. 42). Additionally, a study by Fukuoka and colleagues reported that intracellular chemotaxis signal transduction requires approximately 240 ms beyond CheY or CheY-P diffusion time (ref. 41). Moreover, the intervals of counterclockwise (CCW) and clockwise (CW) rotation of the P. aeruginosa flagellar motor under normal conditions are 1-2 seconds, as determined by tethered cell or bead assays (refs. 30&43).

      Taken together, these indicate that for P. aeruginosa, which moves via a run-reverse mode, the potential 100 ms reduction in response time due to co-localization of the chemotaxis complex and motor has a limited effect on overall chemotaxis timing.

      We have revised the corresponding text accordingly (lines 238-245) to better explain this rationale.

      More surprising for me was to read that "The signal transduction pathways in E. coli are relatively simple, and the chemotaxis response regulator CheY-P affects only the regulation of motor switching". There are degrees of complexity among signal transduction pathways in E. coli, but the chemotaxis seems to be ranked at the top. CheY is part of the adaptation. Perfect adaptation, as many other issues related to the chemotaxis pathway, which include the wide dynamic range, the robustness, the sensitivity, and the signal amplification (gain), are still largely unexplained. Hence, such assumptions are not justified.

      We apologize for the confusion and imprecision in our original statements. Our intention was to convey that the chemotaxis pathway in E. coli is relatively simple compared to the more complex chemosensory systems in P. aeruginosa. We did not mean to generalize this simplicity to all signal transduction pathways in E. coli.

      We acknowledge that E. coli chemotaxis is a highly sophisticated system, involving processes such as perfect adaptation, wide dynamic range, robustness, sensitivity, and signal amplification, many aspects of which remain incompletely understood. CheY indeed plays a crucial role in adaptation and motor switching regulation.

      Accordingly, we have revised the original text (lines 249-255) to avoid any misunderstanding.

      More perplexing is the novelty of the authors' documentation of the effect of the chemotaxis proteins on the c-di-GMP level. In 2013, Kulasekara et al. published a paper in eLife entitled "c-di-GMP heterogeneity is generated by the chemotaxis machinery to regulate flagellar motility". In the same year, Kulasekara published a paper entitled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa". The authors did not cite these works and I wonder why.

      We apologize for having been unaware of these important references and thank the reviewer for bringing them to our attention. We have now cited the eLife paper and the PhD thesis titled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa" by Kulasekara et al.

      Regarding novelty, there are key differences between our findings and those reported by Kulasekara et al. While they proposed that CheA influences c-di-GMP heterogeneity through interaction with a specific phosphodiesterase (PDE), our results demonstrate that overexpression of CheY leads to an increase in intracellular c-di-GMP levels.

      We have revised the original text accordingly (lines 358-362) to clarify these distinctions.

      (6) Throughout the manuscript, the authors refer to foci of fluorescent CheY as "chemoreceptor arrays". If anything, these foci signify the chemotaxis complex, not the membrane-traversing chemoreceptors.

      We thank the reviewer for this clarification. We have revised the manuscript accordingly to refer to the fluorescent CheY foci as representing the chemotaxis complex rather than the chemoreceptor arrays.

      Conclusions:

      The manuscript addresses an interesting subject and contains interesting, but incomplete, data.

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. FlhF is known to cause the motor to localize to the pole in a different bacterial species, Vibrio cholera, but it is not involved in receptor localization in that bacterium. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data look to be high-quality.

      Weaknesses:

      However, the interpretations and conclusions drawn from the experimental observations are not fully justified in my opinion.

      I see two main issues with the evidence provided for the authors' claims.

      (1) Assumptions about receptor localization:

      The authors rely on YFP-tagged CheY to identify the location of the receptor cluster, but CheY is a diffusible cytoplasmic protein. In E. coli, CheY has been shown to localize at the receptor cluster, but the evidence for this in PA is less strong. The authors refer to a paper by Guvener et al 2006, which showed that CheY localizes to a cell pole, and CheA (a receptor cluster protein) also localizes to a pole, but my understanding is that colocalization of CheY and CheA was not shown. My concern is that CheY could instead localize to the motor in PA, say by binding FliM. This "null model" would explain the authors' observations, without colocalization of the receptors and motor. Verifying that CheY and CheA are colocalized in PA would be a very helpful experiment to address this weakness.

      We thank the reviewer for this valuable suggestion. We agree that verifying the colocalization of CheY and CheA would strengthen our conclusions. To address this, we constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex rather than the flagellar motor.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      (2) Argument for the functional importance of receptor-motor colocalization at the pole:

      The authors argue that colocalization of the receptors and motors at the pole is important because it could keep phosphorylated CheY, CheY-p, restricted to a small region of the cell, preventing crosstalk with other signaling pathways. Their evidence for this is that overexpressing CheY leads to higher intracellular cdG levels and cell aggregation. Say that the receptors and motors are colocalized at the pole. In E. coli, CheY-p rapidly diffuses through the cell. What would prevent this from occurring in PA, even with colocalization?

      We appreciate the reviewer's insightful question. The colocalization of both the signaling source (the kinase) and sink (the phosphatase) at the chemoreceptor complex at the cell pole results in a rapid decay of CheY-P concentration within approximately 0.2 µm from the cell pole, leading to a nearly uniform distribution elsewhere in the cell, as demonstrated by Vaknin and Berg (ref. 46). This spatial arrangement effectively confines high CheY-P levels to the pole region. When the motor is also localized at the cell pole, this reduces the need for elevated CheY-P concentrations throughout the cytoplasm, thereby minimizing potential crosstalk with other signaling pathways.

      We have revised the manuscript accordingly (lines 280-286) to clarify this point.

      Elevating CheY concentration may increase the concentration of CheY-p in the cell, but might also stress the cells in other unexpected ways. It is not so clear from this experiment that elevated CheY-p throughout the cell is the reason that they aggregate, or that this outcome is avoided by colocalizing the receptors and motor at the same pole. If localization of the receptor array and motor at one pole were important for keeping CheY-p levels low at the opposite pole, then we should expect cells in which the receptors and motor are not at the pole to have higher CheY-p at the opposite pole. According to the authors' argument, it seems like this should cause elevated cdG levels and aggregation in the delta flhF mutants with wild-type levels of CheY. But it does not look like this happened. Instead of varying CheY expression, the authors could test their hypothesis that receptor-motor colocalization at the pole is important for preventing crosstalk by measuring cdG levels in the flhF mutant, in which the motor (and maybe the receptor cluster) are no longer localized in the cell pole.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using ΔflhF mutants.

      First, as noted above, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in <i.E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild-type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness in this paper is that the authors never discussed how the flagellar gene expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? How many classes are there for these genes? Is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the insightful comments. P. aeruginosa possesses a four-tiered transcriptional regulatory hierarchy controlling flagellar biogenesis. Within this system, fliF and fliG belong to class II genes and are regulated by the master regulator FleQ. In contrast, chemotaxis-related genes such as cheA and cheW are regulated by intracellular free FliA, and currently, there is no evidence that FliA activity is influenced by proteins like FliG.

      To verify that the expression of core chemotaxis proteins was not affected by deletion of fliG, we performed Western blot analyses to compare CheY levels in wild-type, ΔfliF, and ΔfliG strains. We observed no significant differences, indicating that the reduced presence of receptor clusters in these mutants is not due to altered expression of chemotaxis proteins.

      Accordingly, we have revised the manuscript (lines 341-348) and updated Fig. 3B to reflect these findings.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers comment on several important aspects that should be addressed, namely: the lack of statistical analysis; the need for clarifications regarding assumptions made regarding receptor localization; the functional importance of receptor-motor colocalization; and the need for an elaborate discussion of flagellar gene expression. Also, two reviewers pointed out the need to prove the co-localization of CheY and CheA; This is important since CheY is dynamic, shuttling back and forth from the chemotaxis complex to the base of the flagella, whereas CheA (or cheW or, even better, the receptors) is considered less dynamic and an integral part of the chemotaxis complex.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      Line 43: "ubiquitous" - I would choose another word.

      We changed "ubiquitous" to "widespread".

      Line 49: "order" - change to organize.

      We changed "order" to "organize".

      Line 52: "To grow and colonize within the host, bacteria have evolved a mechanism for migrating...". Motility "towards more favorable environments" is an important survival strategy of bacteria in various ecological niches, not only within the host.

      We revised it to "grow and colonize in various ecological niches".

      Line 72: Define F6 in "F6 pathway-related receptors".

      The proteins encoded by chemotaxis-related genes collectively constitute the F6 pathway, which we have now explained in the manuscript text.

      Line 72-73: Do references 17 &18 really report colocalization of the chemotaxis receptor and flagella to the same pole? If these or other reports document such colocalization, then the sentence in the Abstract "Surprisingly, we found that both are located at the same cell pole..." is not correct.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      Line 108: "CheY has been shown to colocalize with chemoreceptors". The authors rely here (reference 29) and in other places on findings in E. coli. However, in the Introduction, they describe the many differences between the motility systems of P. aeruginosa and E. coli, e.g., the number of chemosensory systems and their spatial distribution (E. coli is a peritrichous bacterium, as opposed to the monotrichous bacterium P. aeruginosa). There seem to be proofs for colocalization of the Che and MCP proteins in P. aeruginosa, which should be cited here.

      Thank you for pointing this out. Harwood's group reported that a cheY-YFP fusion strain exhibited bright fluorescent spots at the cell pole, which disappeared upon knockout of cheA or cheW-genes encoding structural proteins of the chemotaxis complex. This strongly suggests colocalization of CheY with MCP proteins in P. aeruginosa. We have now cited this study as reference 17 in the manuscript.

      Figure 1B: Please replace the order of the schematic presentations, so that the cheY-egfp fusion, which is described first in the text, is at the top.

      We have modified the order of related images in Fig. 1B.

      Line 127: "by introducing cysteine mutations". Replace either by "by introducing cysteines" or by "by substituting several residues with cysteines".

      We changed the relevant statement to "by introducing cysteines".

      Line 144-145: "Given that the physiological and physical environments of both cell poles are nearly identical.". I think that also the physical, but certainly the physiological environment of the two poles is not identical. First, one is an old pole, and the other a new pole. Second, many proteins and RNAs were detected mainly or only in one of the poles of rod-shaped Gram-negative bacteria that are regarded as symmetrically dividing. Although my intuition is that the authors are correct in assuming that "it is unlikely that the unipolar distribution of the chemoreceptor array can be attributed to passive regulatory factors", relating it to the (false) identity between the poles is incorrect.

      We thank the reviewer for this important correction. We agree that the physiological environments of the two poles are not identical, given that one is the old pole and the other the new pole, and that many proteins and RNAs show polar localization in rod-shaped Gram-negative bacteria. Accordingly, we have revised the original text (lines 150-152) to read:

      “Despite potential differences in the physical and especially physiological environments at the two cell poles, it is unlikely that the unipolar distribution of the chemotaxis complex can be attributed to passive regulatory factors.”

      Lines 151-154: "Considering the consistent colocalization pattern between chemosensory arrays and flagellar motors in P. aeruginosa". Does the word consistent relate to different reports on such colocalization or to the results in Figure 1D? In case it is the latter, then what is the word consistent based on? All together only 7 cells are presented in the 5 micrographs that compose Figure 1D (back to statistics...).

      We thank the reviewer for raising this point. To clarify, the word "consistent" refers to the observation of colocalization shown in Figure 1D & Figure S3. As noted in the revised figure legend for Figure 1D, a total of 145 cells with labeled flagella were analyzed, all exhibiting consistent colocalization between flagella and chemosensory arrays. Additionally, we have included a new image showing a large field of co-localization in the wild-type strain as Figure S3 to better illustrate this consistency.

      Figure 2A: Omit "Subcellular localization of" from the beginning of the caption.

      We removed the relevant expression from the caption.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend checking that CheY localizes to the receptor cluster in PA. This could be done by tagging cheA with a different fluorophore and demonstrating their colocalization. It would also be helpful to check that they are colocalized in the delta flhF mutant.

      We thank the reviewer for this valuable suggestion. We constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      The experiments under- and over-expressing CheY part seemed too unrelated to receptor-motor colocalization. I think the authors should think about a more direct way of testing whether colocalization of the motor and receptors is important for preventing signaling crosstalk. One way would be to measure cdG levels in WT and in delta flhF mutants and see if there is a significant difference.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using flhF mutants.

      First, as noted in the response to your 2nd comment in Public Review, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Recommendations For The Authors):

      (1) Can the authors elaborate more on the hierarchy of flagellar gene expression in P. aeruginosa and how this relates to their work?

      We thank the reviewer for the suggestion. We have now described the hierarchy of flagellar gene expression in P. aeruginosa in lines 341-348.

      (2) I would suggest that the authors check other flagellar mutants (than FliF and FliG) where the motor is partially assembled (e.g., any of the rod proteins or the P-ring protein), together with FlhF mutant, to see how a partially assembled motor affects the assembly of the chemosensory cluster.

      We thank the reviewer for this valuable suggestion. The P ring, primarily composed of FlgI, acts as a bushing for the peptidoglycan layer, and its absence leads to partial motor assembly. We constructed a ΔflgI mutant and observed that the proportion of cells exhibiting distinct chemotactic complexes was similar to that of the wild-type strain, suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring. We have revised the original text accordingly (lines 217-220) and added the corresponding data as Figure S6.

      (3) I would suggest that the authors check the levels of CheY in cells induced with different concentrations of arabinose (i.e., using western blotting just like they did in Figure 3B).

      We have assessed the levels of CheY in cells induced with different concentrations of arabinose using western blotting, as suggested. The results have been incorporated into the manuscript (lines 274-275) and are presented in Figure S7.

      (4) To my eyes, most of the foci in FliF-FlhF mutant in Figure 3A are located at the pole (which is unlike the FlhF mutant in Figure 2). Is this correct? I would suggest that the authors also investigate this to see where the chemosensory cluster is located.

      We thank the reviewer for pointing this out. The distribution of the chemotaxis complex in the ΔflhFΔfliF strain was investigated and showed in Fig. S4. Indeed, most of the chemoreceptor foci in this mutant are located at the pole. This probably suggests that, in the absence of both FlhF and an assembled motor, the position of the receptor complex may be largely influenced by passive factors such as membrane curvature. This interesting possibility warrants further investigation in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this work, the authors recorded the dynamics of the 5-HT with fiber photometry from CA1 in one hemisphere and LFP from CA1 in the other hemisphere. They observed an ultra-slow oscillation in the 5-HT signal during both wake fulness and NREM sleep. The authors have studied different phases of the ultra-slow oscillation to examine the potential difference in the occurrence of some behavioral state-related physiological phenomena hippocampal ripples, EMG, and inter-area coherence).

      Strengths

      The relation between the falling/rising phase of the ultra-slow oscillation and the ripples is sufficiently shown. There are some minor concerns about the observed relations that should be addressed with some further analysis.

      Systematic observations have started to establish a strong relation between the dynamics of neural activity across the brain and measures of behavioral arousal. Such relations span a wide range of temporal scales that are heavily inter-related. Ultra-slow time-scales are specifically under-studied due to technical limitations and neuromodulatory systems are the strongest mechanistic candidates for controlling/modulating the neural dynamics at these time-scales. The hypothesis of the relation between a specific time-scale and one certain neuromodulator (5-HT in this manuscript) could have a significant impact on the understanding of the hierarchy in the temporal scales of neural activity.

      Weaknesses:

      One major caveat of the study is that different neuromodulators are strongly correlated across all time scales and related to this, the authors need to discuss this point further and provide more evidence from the literature (if any) that suggests similar ultra-slow oscillations are weaker or lack from similar signals recorded for other neuromodulators such as Ach and NA.

      The reviewer is correct to point out that the levels of different neuromodulators are often correlated. For example, most monoaminergic neurons, including serotonergic neurons of the raphe nuclei, show similar firing rates across behavioral states, firing most during wake behavior, less during NREM, and ceasing firing during ‘paradoxical sleep’ or REM (Eban-Rothschild et al 2018). Notably, other neuromodulators, such as acetylcholine (ACh), show the opposite pattern across states, with highest levels observed during REM, an intermediate level during wake behavior, and the lowest level during NREM (Vazquez et al. 2001). Despite these differences, ultraslow oscillations of both monoaminergic and non-monoaminergic neuromodulators, have been described, albeit only during NREM sleep (Zhang et al. 2021, Zhang et al. 2024, Osorio-Ferero et al. 2021, Kjaerby et al. 2022). How ultraslow oscillations of different neuromodulators are related has been only recently explored (Zhang et al. 2024). In this study, dual recording of oxytocin (Oxt) and ACh with GRAB sensors showed that the levels of the two neuromodulators were indeed correlated at ultraslow frequencies with a 2 s temporal shift. Furthermore, this shift could be explained by a hippocampal-to-lateral septum intermediate pathway, in which the level of ACh causally impacts hippocampal activity, which then in turn controls Oxt levels. Given the known temporal relationship between ripples, ACh and Oxt, and now with our work, between ripples and 5-HT, one could infer the relative timing of ultraslow oscillations of ACh, Oxt and 5-HT. While dual recordings of norepinephrine (NE) and 5-HT have not been performed, a similar correlation with temporal shift could be hypothesized given the parallel relationships between NE and spindles (OsorioFerero et al. 2021), and 5-HT and ripples, with the known temporal delay between ripples and spindles (Staresina et al. 2023). The fact that the locus coerulus receives particularly dense projections from the dorsal raphe nucleus (Kim et al. 2004) further suggests that 5-HT ultraslow oscillations could drive NE oscillations. How exactly ultraslow oscillations of serotonin are related to ultraslow oscillations of different neuromodulators in different brain regions remains to be studied.

      We have further addressed this question and how it relates to the issue of causality in the Discussion section of the manuscript (p. 13):

      “In addition to the difficulties involved with typical causal interventions already mentioned, the fact that the levels of different neuromodulators are interrelated and affected by ongoing brain activity makes it very hard to pinpoint ultraslow oscillations of one specific neuromodulator as controlling specific activity patterns, such as ripple timing. While a recent paper purported to show a causative effect of norepinephrine levels on ultraslow oscillations of sigma band power, the fact that optogenetic inhibition of locus coerulus (LC) cells, but also excitation, only caused a minor reduction of the ultraslow sigma power oscillation suggests that other factors also contribute (Osorio-Forero et al., 2021). Generally, it is thought that many neuromodulators together determine brain states in a combinatorial manner, and it is probable that the 5-HT oscillations we measure, like the similar oscillations in NE, are one factor among many.

      Nevertheless, given the known effects of 5-HT on neurons, it is not unlikely that the 5-HT fluctuations we describe have some impact on the timing of ripples, MAs, hippocampal-cortical coherence, or EMG signals that correlate with either the rising or descending phase. In fact, causal effects of 5-HT on ripple incidence (Wang et al. 2015, ul Haq et al. 2016 and Shiozaki et al. 2023), MA frequency (Thomas et al. 2022), sensory gating (Lee et al. 2020), which is subserved by inter-areal coherence (Fisher et al. 2020), and movement (Takahashi et al. 2000, Alvarez et al. 2022, Jacobs et al. 1991 and Luchetti et al. 2020) have all been shown. Our added findings that serotonin affects ripple incidence in hippocampal slices in a dose-dependent manner (Figure S1) further suggests that the relationship between ultraslow 5-HT oscillations and ripples we report may indeed result, at least in part, from a direct effect of serotonin on the hippocampal network.

      Whether these ‘causal’ relationships between 5-HT and the different activity measures we describe can be used to support a causal link between ultraslow 5-HT oscillations and the correlated activity we report remains an open question. To that point, some studies have described changes in ultraslow oscillations due to manipulation of serotonin signaling. Specifically, reduction of 5-HT1a receptors in the dentate gyrus was recently shown to reduce the power of ultraslow oscillations of calcium activity in the same region (Turi et al. 2024). Furthermore, psilocin, which largely acts on the 5-HT2a receptor, decreased NREM episode length from around 100 s to around 60 s, and increased the frequency of brief awakenings (Thomas et al. 2022). While ultraslow oscillations were not explicitly measured in this study, the change in the rhythmic pattern of NREM sleep episodes and brief awakenings, or microarousals, suggests an effect of psilocin on ultraslow oscillations during NREM. Although these studies do not necessarily point to an exclusive role for 5-HT in controlling ultraslow oscillations of different brain activity patterns, they show that changes in 5-HT can contribute to changes in brain activity at ultraslow frequencies.”

      A major question that has been left out from the study and discussion is how the same level of serotonin before and after the peak could be differentially related to the opposite observed phenomenon. What are the possible parallel mechanisms for distinguishing between the rising and falling phases? Any neurophysiological evidence for sensing the direction of change in serotonin concentration (or any other neuromodulator), and is there any physiological functionality for such mechanisms?

      We have added a paragraph in the discussion to address how this differentiation of the 5-HT signal may be carried out (Discussion, paragraph #3, p. 10):

      “In order for the ultraslow oscillation phase to segregate brain activity, as we have observed, the hippocampal network must somehow be able to sense the direction of change of serotonin levels. While single-cell mechanisms related to membrane potential dynamics are typically too fast to explain this calculation, a theoretical work has suggested that feedback circuits can enable such temporal differentiation, also on the slower timescales we observe (Tripp and Eliasmith, 2010). Beyond the direction of change in serotonin levels, temporal differentiation could also enable the hippocampal network to discern the steeper rising slope versus the flatter descending slope that we observe in the ultraslow 5-HT oscillations (Figure S2), which may also be functionally relevant (Cole and Voytek, 2017). The distinction between the rising and falling phase of ultraslow oscillations is furthermore clearly discernible at the level of unit responses, with many units showing preferences for either half of the ultraslow period (Figure S6). Another factor that could help distinguish the rising from the falling phase is the level of other neuromodulators, as it is likely the combination of many neuromodulators at any given time that defines a behavioral substate. Given the finding that ACh and Oxt exhibit ultraslow oscillations with a temporal shift (Zhang et al. 2024), one could posit that distinct combinations of different levels of neuromodulators could segregate the rising from the falling phase via differential effects of the combination of neuromodulators on the hippocampal network.”

      Functionally, the ability to distinguish between the rising and falling phases of an oscillatory cycle is a form of phase coding. A well-known example of this can be seen in hippocampal place cells, which fire relative to the ongoing theta oscillations. The key advantage of phase coding is that it introduces an additional dimension, i.e. phase of firing, beyond the simple rate of neural firing. This allows for the multiplexing of information (Panzeri et al., 2010), enabling the brain to encode more complex patterns of activity. Moreover, phase coding is metabolically more efficient than traditional spike-rate coding (Fries et al., 2007).

      Reviewer #2 (Public review):

      Summary:

      In their study, Cooper et al. investigated the spontaneous fluctuations in extracellular 5-HT release in the CA1 region of the hippocampus using GRAB5-HT3.0. Their findings revealed the presence of ultralow frequency (less than 0.05 Hz) oscillations in 5-HT levels during both NREM sleep and wakefulness. The phase of these 5-HT oscillations was found to be related to the timing of hippocampal ripples, microarousals, electromyogram (EMG) activity, and hippocampal-cortical coherence. In particular, ripples were observed to occur with greater frequency during the descending phase of 5-HT oscillations, and stronger ripples were noted to occur in proximity to the 5-HT peak during NREM. Microarousal and EMG peaks occurred with greater frequency during the ascending phase of 5-HT oscillations. Additionally, the strongest coherence between the hippocampus and cortex was observed during the ascending phase of 5-HT oscillations. These patterns were observed in both NREM sleep and the awake state, with a greater prevalence in NREM. The authors posit that 5-HT oscillations may temporally segregate internal processing (e.g., memory consolidation) and responsiveness to external stimuli in the brain.

      Strengths:

      The findings of this research are novel and intriguing. Slow brain oscillations lasting tens of seconds have been suggested to exist, but to my knowledge they have never been analyzed in such a clear way. Furthermore, although it is likely that ultra-slow neuromodulator oscillations exist, this is the first report of such oscillations, and the greatest strength of this study is that it has clarified this phenomenon both statistically and phenomenologically.

      Weaknesses:

      As with any paper, this one has some limitations. While there is no particular need to pursue them, I will describe ten of them below, including future directions:

      (1) Contralateral recordings: 5-HT levels and electrophysiological recordings were obtained from opposite hemispheres due to technical limitations. Ipsilateral simultaneous recordings may show more direct relationships.

      Although we argue that bilateral symmetry defines both the serotonin system and many hippocampal activity patterns (Methods: Dual fiber photometry and silicon probe recordings), we agree that ipsilateral recordings would be superior to describe the link between serotonin and electrophysiology in the hippocampus. In addition to noting that a recent study has adopted the same contralateral design (Zhang et al. 2024), we add a reference further supporting bilateral hippocampal synchrony, specifically of dentate spikes (Farrell et al. 2024). However, as functional lateralization has been recently proposed to underlie certain hippocampal functions in the rodent (Jordan 2020), future studies should ideally include both imaging and electrophysiology in a single hemisphere to guarantee local correlations rather than assuming inter-hemispheric synchrony. This could be accomplished using an integrated probe with attached optical fibers, as described in Markowitz et al. 2018, which is however technically more challenging and has, to our knowledge, not yet been implemented with fiber photometry recordings with GRAB sensors. Given the required separation of a few hundred micrometers between the probe shanks and the optical fiber cannula, it is important to consider whether the recordings are capturing the same neuronal populations. For example, there is a risk of recording electrical activity from dorsal hippocampal neurons while simultaneously measuring light signals from neurons in the intermediate hippocampus, which are functionally distinct populations (Fanselow and Dong 2009).

      (2) Sample size: The number of mice used in the experiments is relatively small (n=6). Validation with a larger sample size would be desirable.

      While larger sample sizes generally reduce the influence of random variability and minimize the impact of outliers on conclusions, our use of mixed-effects models mitigates these concerns by accounting for both inter-session and inter-mouse variability. With this approach, we explicitly model random effects, such as the variability between individual mice and sessions, alongside fixed effects (such as treatment), which ensures that our results are not driven by random fluctuations in a few individual mice or sessions. Furthermore, the inclusion of random intercepts and slopes in the models allows for the possibility that different animals and/or sessions have different baseline characteristics and respond to different degrees of magnitude to the treatment. In summary, while validating these findings with a larger sample size would certainly help detect more subtle effects, we are confident in the robustness of the conclusions presented.

      (3) Lack of causality: The observed associations show correlations, not direct causal relationships, between 5-HT oscillations and neural activity patterns.

      We agree that the data we present in this study is largely correlational and generally avoid claims of causality in the manuscript. In the Discussion section, we discuss barriers to interpreting typical causal interventions in vivo, such as optogenetic activation of raphe nuclei: “The two previously mentioned in vivo studies showing reduced ripple incidence…”(paragraph #10, pg. 12), as well as an added section on further causality considerations in the Discussion section of the manuscript (paragraph #12, pg. 13): “In addition to the difficulties involved with…”

      Due to these barriers, as a first step, we wanted to describe how physiological changes in serotonin levels are correlated to changes in the hippocampal activity. Equipped with a deeper understanding of physiological serotonin dynamics, future studies could explore interventions that modulate serotonin in keeping with the natural range of serotonin fluctuations for a given state. On that point, another challenge which we have not mentioned in the manuscript is that modulating serotonin, or any neuromodulator’s levels, has the potential, depending on the degree of modulation, to transition the brain to an entirely different behavioral state. This then complicates interpretation, as one is not sure whether effects observed are due to the changes in the neuromodulator itself, or secondary to changes in state. At the same time, 5-HT activity drives networks which in return can change the release of other neurotransmitters, leading to indirect effects.

      The results of our in vitro experiments suggest that a causal relationship between serotonin and ripples is possible (Figure S1). Though the hippocampal slice preparation is clearly an artificial model, it provides a controlled environment to isolate the effects of serotonin manipulation on the hippocampal formation, without the confounding influence of systemic 5-HT fluctuations in other brain regions. Notably, the dose-dependent effects of serotonin (5-HT) wash-in on ripple incidence observed in vitro closely mirror the inverted-U dose-response curve seen in our in vivo experiments across states, where small increases in serotonin lead to the highest ripple incidence, and both lower and higher levels correspond to reduced ripple activity. This parallel suggests that the gradual washing of serotonin in our in vitro system may mimic the tonic firing changes in serotonergic neurons that occur during state transitions in vivo. These findings underscore the importance of studying how different dynamics of serotonin modulation can differentially affect hippocampal network activity.

      (4) Limited behavioral states: The study focuses primarily on sleep and quiet wakefulness. Investigation of 5-HT oscillations during a wider range of behavioral states (e.g., exploratory behavior, learning tasks) may provide a more complete understanding.

      We agree that future studies should investigate a broader range of behavioral states. For this study, as we were focused on general sleep and wake patterns, our recordings were done in the home cage, and we limited ourselves to the basic behavioral states described in the paper. Future studies should be designed to investigate ultraslow 5-HT oscillations during different behaviors, such as continuous treadmill running. Specifically, a finer segregation of extended wake behaviors by level of arousal could greatly add to our understanding of the role of ultraslow serotonin oscillations.

      (5) Generalizability to other brain regions: The study focuses on the CA1 region of the hippocampus. It's unclear whether similar 5-HT oscillation patterns exist in other brain regions.

      Given the reported ultraslow oscillations of population activity in serotonergic neurons of the dorsal raphe nucleus (Kato et al. 2022) as well as the widespread projections of the serotonergic nuclei, we would expect a broad expression of ultraslow 5-HT oscillations throughout the brain. So far, ultraslow 5-HT oscillations have been described in the basal forebrain, as well as in the dentate gyrus, in addition to what we have shown in CA1 (Deng et al. 2024 and Turi et al. 2024). Furthermore, our results showing that hippocampal-cortical coherence changes according to the phase of hippocampal ultraslow 5-HT oscillations suggests that 5-HT can affect oscillatory activity either indirectly by modulating hippocampal cells projecting to the cortical network or directly by modulating the cortical postsynaptic targets. Given the heterogeneity in projection strength, as well as in pre- and postsynaptic serotonin receptor densities across brain regions (de Filippo & Schmitz, 2024), it would be interesting to see whether local ultraslow 5-HT oscillations are differentially modulated, e.g. in terms of oscillation power. Future studies investigating different brain regions via implantation of multiple optic fibers in different brain areas or using the mesoscopic imaging approach adopted in Deng et al. 2024, will be needed to examine the extent of spatial heterogeneity in this ultraslow oscillation.

      (6) Long-term effects not assessed: Long-term effects of ultra-low 5-HT oscillations (e.g., on memory consolidation or learning) were not assessed.

      While beyond the scope of our current study, we agree that an important next step would involve modulating the ultraslow serotonin oscillation after learning, and then examining potential effects on memory consolidation, presumably via changes in ripple dynamics, though many possibilities could explain potential effects. There, our results suggest it would be important to isolate effects due to the change in ultraslow oscillation features, rather than simply overall levels of 5-HT. To that end, it would be important to test different modulation dynamics, specifically modulating the oscillation strength, around a constant mean 5-HT level by carefully timed optogenetic stimulation/inhibition. Afterwards, showing a clear correlation between the strength of the 5-HT modulation and memory performance would be important to establishing the relationship, as done in Lecci et al 2017, where more prominent ultraslow oscillations of sigma power in the cortex during sleep, alongside a higher density of spindles, were correlated with better memory consolidation. Given the tight coupling of spindles and ripples during sleep, it is possible that a similar effect on memory consolidation would be observed following changes in ultraslow 5-HT oscillation power.

      (7) Possible species differences: It's uncertain whether the findings in mice apply to other mammals, including humans.

      We agree that the experiments should ultimately be replicated in humans. In the 2017 study by Lecci et al., the authors highlighted the shared functional requirements for sleep across species, despite apparent differences, such as variations in sleep volume. To explore these commonalities, the researchers conducted parallel experiments in both mice and humans, aiming to identify a universal organizing structure. They discovered that the ultraslow oscillation of sigma power serves this role, enabling both species to balance the competing demands of arousability and sleep imperviousness. Based on this finding, it is plausible that ultraslow oscillations of serotonin, which similarly modulate activity according to arousal levels, would serve a comparable function in humans.

      (8) Technical limitations: The temporal resolution and sensitivity of the GRAB5-HT3.0 sensor may not capture faster 5-HT dynamics.

      The kinetics of the GRAB5-HT3.0 sensor used in this study limit the range of serotonin dynamics we can observe. However, the ultraslow oscillations we measure reflect temporal changes on the scale of 20 s and greater, whereas the GRAB sensor we use has sub-second on kinetics and below 2 s off kinetics (Deng et al. 2024). Therefore, the sensor is capable of reporting much faster activity than the ultraslow oscillations we observe, indicating that the ultraslow 5-HT signal accurately reflects the dynamics on this time scale. Furthermore, the presence of ultraslow oscillations in spiking activity—observed in the hippocampal formation (Gonzalo Cogno et al., 2024; Aghajan et al., 2023; Penttonen et al., 1999) and in the dorsal raphe (Mlinar et al., 2016), which are not affected by the same temporal smoothing, suggests that the oscillations we record are not likely due to signal aliasing, but instead reflect genuine oscillatory activity. Of course, this does not preclude that other, faster serotonin dynamics are also present in our signal, some of which may be too fast to be observed. For instance, rapid serotonin signaling via the ionotropic 5-HT3a receptors could be missed in our recordings. Additionally, with the fiber photometry approach we adopted, we are limited to capturing spatially broad trends in serotonin levels, potentially overlooking more localized dynamics.

      (9) Interactions with other neuromodulators: The study does not explore interactions with other neuromodulators (e.g., norepinephrine, acetylcholine) or their potential ultraslow oscillations.

      We agree that the interaction between neuromodulators in the context of ultraslow oscillations is an important issue, which we have addressed in our response to reviewer #1 under ‘Weaknesses.’

      (10) Limited exploration of functional significance: While the study suggests a potential role for 5-HT oscillations in memory consolidation and arousal, direct tests of these functional implications are not included.

      We agree and reference our answer to (6) regarding memory consolidation. Regarding arousal, direct tests of arousability to different sensory stimuli during different phases of the ultraslow 5-HT oscillation during sleep would be beneficial, in addition to the indirect measures of arousal we examine in the current study, e.g. degree of movement (icEMG) and long range coherence. In line with what we have shown, Cazettes et al. (2021) has demonstrated a direct relationship between 5-HT levels and pupil size, an indicator of arousal level, which like our findings, is consistent across behavioral states.

      Reviewer #3 (Public review):

      Summary:

      The activity of serotonin (5-HT) releasing neurons as well as 5-HT levels in brain structures targeted by serotonergic axons are known to fluctuate substantially across the animal's sleep/wake cycle, with high 5-HT levels during wakefulness (WAKE), intermediate levels during non-REM sleep (NREM) and very low levels during REM sleep. Recent studies have shown that during NREM, the activity of 5HT neurons in raphe nuclei oscillates at very low frequencies (0.01 - 0.05 Hz) and this ultraslow oscillation is negatively coupled to broadband EEG power. However, how exactly this 5-HT oscillation affects neural activity in downstream structures is unclear.

      The present study addresses this gap by replicating the observation of the ultraslow oscillation in the 5-HT system, and further observing that hippocampal sharp wave-ripples (SWRs), biomarkers of offline memory processing, occur preferentially in barrages on the falling phase of the 5-HT oscillation during both wakefulness and NREM sleep. In contrast, the raising phase of the 5-HT oscillation is associated with microarousals during NREM and increased muscular activity during WAKE. Finally, the raising 5-HT phase was also found to be associated with increased synchrony between the hippocampus and neocortex. Overall, the study constitutes a valuable contribution to the field by reporting a close association between raising 5-HT and arousal, as well as between falling 5-HT and offline memory processes.

      Strengths:

      The study makes compelling use of the state-of-the-art methodology to address its aims: the genetically encoded 5-HT sensor used in the study is ideal for capturing the ultraslow 5-HT dynamics and the novel detection method for SWRs outperforms current state-of-the-art algorithms and will be useful to many scientists in the field. Explicit validation of both of these methods is a particular strength of this study.

      The analytical methods used in the article are appropriate and are convincingly applied, the use of a general linear mixed model for statistical analysis is a particularly welcome choice as it guards against pseudoreplication while preserving statistical power.

      Overall, the manuscript makes a strong case for distinct sub-states across WAKE and NREM, associated with different phases of the 5-HT oscillation.

      Weaknesses:

      All of the evidence presented in the study is correlational. While the study mostly avoids claims of causality, it would still benefit from establishing whether the 5-HT oscillation has a direct role in the modulation of SWR rate via e.g. optogenetic activation/inactivation of 5-HT axons. As it stands, the possibility that 5-HT levels and SWRs are modulated by the same upstream mechanism cannot be excluded.

      We agree that causality claims cannot be made with our data, and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One major question in the presented data is the nature of the asymmetrical shape of the targeted slow events. How much does it reflect the 5-HT concentration and how much is this shape affected by the dynamics of the designed 5-HT sensor? This needs to be addressed in more detail referencing the original paper for the used sensor.

      We have added a paragraph in the Results section of the manuscript to address the asymmetric waveform of the ultraslow 5-HT oscillations and whether it could be affected by the asymmetric kinetics of the GRAB sensor we use: “The waveform of these ultraslow 5-HT oscillations…” (Results, paragraph #4, pg. 5). We include an extended answer to the question here:

      Indeed, the GRAB5-HT3.0 sensor we use in the study shows activation response kinetics which are faster than their deactivation time, with time constants at 0.25 s and 1.39 s, respectively (Deng et al. 2024). Likewise, the slope of the rising phase of the ultraslow serotonin oscillation we measure is faster than the slope of the falling phase, and the ratio of time spent in the rising phase versus the falling phase is less than 1, indicating longer falling phases (Figure S2). Although we cannot completely rule out that the asymmetric shape of the ultraslow serotonin oscillations we record is affected by this asymmetry in the 5-HT sensor kinetics, we believe this is unlikely, as the 5-HT signal clearly contains reductions in 5-HT levels that are much faster than the descending phase of the ultraslow oscillation. Although it is difficult to directly compare the different-sized signals, the reported timescales of off kinetics, on the order of a few seconds (Deng et al. 2024), are far below the tens of seconds timescale of the ultraslow oscillation. Furthermore, the finding that some dorsal raphe neurons modulate their firing rate at ultraslow frequencies, and moreover that all examples of such ultraslow oscillations shown display clear asymmetry in rising time versus decay, suggests that the asymmetry we observe in our data could be due to neural activity rather than temporal smoothing by the sensor (Mlinar et al. 2016). In this same direction, another study found similar asymmetry in extracellular 5-HT levels measured with fast scan cyclic voltammetry (FSCV), a technique with greater temporal resolution (sampling rate of 10 Hz) than GRAB sensors, after single pulse stimulation (Bunin and Wightman 1998). In this study, 5-HT was shown to be released extrasynaptically, making the longer clearing time compared to the release time intuitive. Finally, the observation that the onsets and offsets of ripple clusters, recorded with a sampling rate of 20 kHz, are precisely aligned with the peaks and troughs of ultraslow serotonin oscillations (Figure 1, H1-2, columns 2-3) suggests that the duration of the falling phase is not artificially distorted by the temporal smoothing of the sensor dynamics.

      Regardless of the dynamics of the serotonin concentration, it should be noted that the elicited neuronal effect might have different dynamics compared to the 5-HT concentration that need to be more studied: to address this one can either examine the average of the broadband LFP (not high passfiltered by the amplifier) or the distribution of simultaneously recorded spiking activity around the peak of ultra-slow oscillations.

      We have added Figure S6, showing unit activity relative to the phase of ultraslow serotonin oscillations.

      From this analysis, we uncover three groups of units which are largely preserved across states (Figure S6, E vs. F), albeit with a slight temporal shift rightward from NREM to WAKE (Figure S6, C vs. D). Namely, some units spike preferentially during the rising phase, some during the falling phase, and a third group have no clear phase preference. Unit activity during the falling phase is unsurprising, as it is where ripples largely occur, which themselves are associated with spike bursts. During the rising phase, the unit activity we observe could correspond to firing of the hippocampal subpopulation known to be active during NREM interruption states (Jarosiewicz et al. 2002, Miyawaki et al. 2017). While the units’ phase preference was tested based on the category of rising vs. falling phase, as this division described most variation in the data, a few units in the ‘No preference’ group showed heightened activity near the oscillation peak. However, given the very small number of units with this preference, more unit data is needed to describe this group, ideally with high-density recordings. Overall, most units showed a falling vs. rising phase preference, indicating a phase coding of hippocampal activity by 5-HT ultraslow oscillations.

      Related to the previous point, it would be helpful to show the average cycle shape of these oscillations (relative to the phase 0 extracted in Figure 3) and do the shape comparison across sessions and also wake/NREM

      We agree, and to this end we have added Figure S2. From this waveform analysis, we show that the ultraslow serotonin oscillation is asymmetric, with the rising phase having a greater slope, but shorter length, than the falling phase. While this asymmetry is observed both in NREM and WAKE, the slope difference and length ratio difference in rising vs. falling phase is greater in NREM (Figure S2. B).

      In Figure 3D, there seem to be oscillatory rhythms with faster cycles on top of the targeted oscillations. That would make the phase estimation less accurate, e.g. in the left panel, in the second cycle, it is not clear if there are two faster cycles or it is one slow cycle as targeted, and if noted in the rising phase of the second fast cycle there are no ripples. This might suggest that regardless of specific oscillation frequency whenever 5-HT is started to get released, the ripples are suppressed and once the 5-HT is not synaptically effective anymore the ripples start to get generated while the photometry signal starts to wane with the serotonin being cleared. Still, if there is any rhythmicity between bouts of no ripple, it would suggest an ultra-slow regularity in the 5-HT release.

      The reviewer is correct to point out that some faster increases in serotonin, which occur on top of the ultraslow oscillations we measure, seem to be associated with decreased ripple incidence, as in the example referenced. The dominance of ultraslow frequencies in the power spectrum of the 5-HT signal suggests, however, that oscillations faster than the ultraslow oscillations we describe are far less prevalent in the data. While there may be some coupling of ripples and other measures to serotonin oscillations of different frequencies, this may be hard or impossible to detect with phase analysis based on their infrequent occurrence and nonstationary nature. In fact, we show in Figure S3 that the strongest phase modulation of ripples by ultraslow serotonin oscillations is observed in the frequencies we use (0.01-0.06 Hz). Methodologically, phase analysis indeed assumes stationary signals, which are rare if not absent in physiological data (Lo et al. 2009), however generally the narrower the frequency band, the better the phase estimation. The narrow frequency band we use provides phase estimates that are largely robust and unaffected by the presence of faster oscillations, as can be seen in the example phase traces shown in Figure 4.

      The hypothesis that the rising phase burst of synaptic serotonin is what silences ripples, and that with the clearing of serotonin from the synapses, ripples recover, is a possible explanation of our findings. However, if this were the case, one could expect the ripple rate to increase over the course of the falling phase of ultraslow 5-HT oscillations, as 5-HT decreases, and peak at the trough. This is at odds with what we observe, namely a fairly uniform distribution of ripples along the falling phase (Figure 3F2,F4). Furthermore, the Mlinar et al. 2016 study describes a subpopulation of raphe neurons whose firing rates themselves oscillate at ultraslow frequencies, rather than on-off bursting at ultraslow frequencies, which would argue against this hypothesis. However, as this study looks at a small number of neurons in slices, further in vivo experiments examining firing rates of median raphe neurons are required to understand how the ultraslow oscillation of extracellular serotonin that we measure is generated as well as how it is related to ripple rates.

      In Figure 3B, it is not clear why IRI is z-scored. It would be informative to have the actual value of IRI. What is the z relative to? Is it the mean value of IRI in each recording session? Is this to reduce the variability across sessions?

      We have now included in Figure 3D a box plot displaying the IRI distributions across different states and sessions. To minimize inter-session variability, data were z-scored within each session for visualization purposes. However, all general linear models were based on raw data, and as a result, the raw differences in IRI are shown in Figure 3C.

      Figure 3E, panel labels don't match with the caption

      We are grateful to the reviewer for pointing out this mistake, which we have corrected in the updated version of the manuscript.

      In the text related to Figure 3E, the related analysis can be more clearly described. "phase preference of individual ripples" does not immediately suggest that the occurring phase of each ripple relative to the targeted oscillation is extracted. I suggest performing this analysis individually for each session and summarizing the results across the sessions.

      We have reworded the sentence in Results: 5-HT and ripples to better reflect the analysis performed: “Next, we calculated the ultraslow 5-HT phases at which individual ripples occurred during both NREM and WAKE (3E-F) ...”. Regarding session-level data, we have added Figure S3, which shows session level mean phase vectors, as well as the grand mean across sessions for both NREM and WAKE. Included in this figure are session level means for frequency bands outside of the ultraslow band we used in our study, intended to show that ripples are most strongly timed by the ultraslow band (0.01-0.06 Hz), reflected by the greater amplitude of the mean phase vector for this band.

      Figure 3E2, based on the result of ripple-triggered 5-HT in left panels of 2H1-2, one would expect to see a preferred phase closer to 180 (toward the end of the falling phase), it would be helpful to compare and discuss the results of these two analyses.

      The reviewer is correct to point out the apparent discrepancy in where the mean ripple falls with respect to the ongoing serotonin oscillation between the two figures mentioned. We have addressed this point in Results: 5-HT and ripples, paragraph #4: “This result appear to be at odds with…”.

      Regarding the analysis in 3F, please also compare the power distribution of ripples between NREM and wake. This will help to better understand the potential difference behind the observed difference: how much the strong ripples are comparable between wake and NREM. It is also necessary to report the ripple detection failure rate across ripples with different strengths.

      We have added a figure showing analysis done on a subset of the data in which ripples were manually curated in order to evaluate the performance of the ripple detection model (Figure S7) and explanatory text in Methods: Model performance: ‘To ensure that our model …’. In summary, while missed ripples did tend to have lower power than correctly detected ripples, including them did not change the distribution of ripples by the phase of the ultraslow serotonin oscillation (Figure S7C). We would also note that while the phase preference is noisier than what is presented in Figure 3F because this analysis was done with a small subset of all recorded ripples, the fact that ripples occur more clearly on the falling phase is visible for both detected ripples and detected + false negative ripples.

      The mixed-effects model examining the influence of 5-HT ultraslow oscillation phase on ripple power revealed no significant effect of state (p = 0.088). This indicates that whether the data were collected during NREM or wake periods did not significantly impact ripple power and that the lack of a significant effect (in Figure 3G,H) in WAKE is probably not due to a difference in the distribution of ripple power between states.

      4D, y label is z?

      We are grateful for the reviewer to point that out, yes, the y label should be ‘z-score’, as the two traces represent z-scored 5-HT (blue) and z-scored shuffled data (orange). Figure 4D2 and Figure 2H1-2, which show similar data, have been corrected to address this oversight.

      Relating to Figure 4, EMG comparison across phases of the oscillations is insightful. Two related and complementary analyses are to compare the theta and gamma power between the falling and rising phases.

      We have addressed this suggestion in Figure S5 A-C. While low gamma, high gamma and theta power are modulated identically in NREM, with higher power observed during the falling phase than the rising phase, during WAKE, different patterns can be seen. Specifically, low gamma power shows no phase preference, while high gamma shows a peak near the center of the ultraslow 5-HT oscillation. Theta power, as in NREM, is higher during the falling phase of ultraslow 5-HT oscillations. Increased power across many frequency bands was shown to coincide with decreases in DRN population activity during NREM, which matches with what we report here (Kato et al. 2022). In summary, while NREM patterns are consistent in all frequency bands tested, aligning with the pattern of ripple incidence, in WAKE low and high gamma power show different relationships to ultraslow 5-HT phase.

      In the manuscript, we have used the data in both Figure S5 and S6 (unit activity relative to ultraslow 5-HT oscillations), to argue against the idea that our coherence findings result from a lack of activity in the rising phase (see next question), which would have the effect of ‘artificially’ reducing coherence in the falling phase relative the rising phase. The text can be found in Results: 5-HT and hippocampal cortical coherence, paragraph #2.

      The results presented in Figure 5 could be puzzling and need to be further discussed: if the ripple band activity is weak during the rising phase, in what circumstances the coherence between cortex and CA1 is specifically very strong in this band?

      As mentioned in the previous answer, we have addressed this concern in Results: 5-HT and hippocampal-cortical coherence, paragraph #2. In summary, it is true that the higher coherence in rising phase than in the falling phase for the highest frequency band (termed ‘high frequency oscillation’ (HFO), 100-150 Hz) could be unexpected, given that ripples occur largely during the falling phase. A few points could help explain this finding. Firstly, it should be noted that power in the 100-150 Hz band can arise from physiological activity outside of ripples, such as filtered non-rhythmic spike bursts (Liu et al. 2022), whose coherent occurrence in the rising phase could explain the coherence findings. Secondly, coherence is a compound measure which is affected by both phase consistency and amplitude covariation (Srinath and Ray 2014), thus from only amplitude one cannot predict coherence. Furthermore, HFO power in the cortex is highest near the peak of ultraslow 5-HT oscillations (Figure S5D), as opposed to the falling phase peak in the hippocampus. This shows a lack of covariation in amplitude by phase between the hippocampus and cortex at this frequency band. An alternative explanation of our findings regarding coherence could be that in the rising phase, there is simply little to no activity, which is easier to ‘synchronize’ than bouts of high activity. Hippocampal unit activity in the rising phase (Figure S6) suggests however, that it is not likely to be the absence of activity supporting higher coherence in the rising phase across frequencies. Additional experiments using high density recordings should be conducted to examine 5-HT ultraslow oscillations and their role in gating activity across brain regions, though these results strongly suggest some role exists.

      Reviewer #2 (Recommendations for the authors):

      I would like to offer two comments. I believe that these are not unusual requests, and thus I would like the authors to respond.

      (1) It would be prudent to investigate the possibility that the observed correlation between ultraslow and hippocampal ripples/microarousals is merely superficial and that there are unidentified confounding factors at play. For example, it would be beneficial to provide evidence that administering a serotonin receptor inhibitor result in the disappearance of the slow oscillation of ripples and microarousals, or that the correlation with ultraslow is no longer present. Please note that the former experiments do not require GRAB5-HT3.0 imaging.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3. We would further like to note that given the large number of serotonin receptors and the lack of selectivity of many serotonin receptor antagonists, a pharmacological approach would be difficult, though the results certainly useful. Finally, we highlight the psilocin study, which reported changes in the rhythmic occurrence of microarousals, and therefore likely ultraslow oscillations, after administering a 5-HT2a receptor agonist, suggesting a potential causal effect of 5-HT (via 5-HT2a receptor) on MA occurrence (Thomas et al. 2022).

      (2) The slow frequency appears to be associated with the default mode network as observed in fMRI signals. The neural basis of the default mode network remains unclear; therefore, a more detailed examination of this possibility would be beneficial.

      We agree that it would be interesting to investigate the role of 5-HT in the neural basis of the DMN.

      The DMN as described in humans (Raichle et al. 2001) and rodents (Lu et al. 2012) may indeed include some parts of the hippocampus and perhaps some of our neocortical recordings could also be considered part of the DMN. The fact that the activity across the inter-connected brain structures of the DMN is correlated at ultraslow time scales (Gutierrez-Barragan et al. 2019, Mantini et al. 2007), as well as serotonin’s ability to modulate the DMN is intriguing (Helmbold et al. 2016). Further studies simultaneously recording DMN activity via fMRI and electrical activity via silicon probes, as done in Logothetis et al. 2001, could elucidate further a potential link between ultraslow oscillations and the DMN, with serotonergic modulation as a means to understand any potential contribution of serotonin.

      Reviewer #3 (Recommendations for the authors):

      (1) The impact of the study would benefit from an experiment causally testing the effect of hippocampal 5-HT levels on hippocampal physiology, e.g. using optogenetic manipulations.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      (2) Data presentation: the figures are of poor resolution, making some diagram details and, more importantly, some example traces (e.g. Figure 1A, right) impossible to see. This should be corrected by either increasing figure resolution or making important figure elements large enough to be readable.

      We apologize for the poor resolution and have corrected it in the updated version of the manuscript.

      (3) Differences in some figure panels are not statistically assessed: Figure 1H (differences in spectrum peak power), Figure 3E1 & Figure 3E3 (directional bias of the circular distributions), Figure 4C (difference from 0 mean).

      We acknowledge this oversight and have added statistical tests for all three figures, as well as further information regarding the models used in Methods: Statistics.

      (4) Lines 279-280: the claim that the study shows "organization of activity by ultraslow oscillations of 5-HT" implies a causal role of 5-HT in organizing hippocampal activity. I suggest that this statement be toned down to reflect the correlational nature of the presented evidence.

      We have rephrased the sentence in question to the following: “In our study, including both NREM and WAKE periods allowed us to additionally show that the temporal organization of activity relative to ultraslow 5-HT oscillations operates according to the same principles in both states...”, which we believe better reflects the temporal correlation we describe.

      (5) While the study claims to use the EMG (i.e. electromyograph) signal, it does not describe any electrodes placed inside the muscle in the methods section. The SleepScoreMaster toolbox used in the study estimates the EMG using high-frequency activity correlated across recording channels, so I assume this is how this signal was obtained. While such activity may well reflect muscular noise to some degree, it is an indirect measure as the electrodes are not in the muscle. Since the EMG signal is central to the message of the manuscript, the method for calculating it should be described in the methods section and it should be explicitly labelled as an indirect measure in the main text, e.g. by referring to this signal as pseudo-EMG.

      We agree and have added explanatory text to the State Scoring subsection in Methods. Given that the EMG we refer to is derived from intracranial data, and not from traditional EMG probes, we now refer to the EMG as intracranial EMG, or icEMG for short, throughout the main text.

      (6) Is ripple frequency or ripple duration different across the rising and falling phases of the ultraslow oscillation?

      We have now investigated this suggestion in Figure S4, where we show that ripple frequency is higher in the falling phase than rising phase, while ripple duration appears to show no phase preference.

      (7) Lines 315-317: I am not sure why the manuscript refers to the coupling between EMG and 5-HT levels as 'puzzling' given that, as stated, the locomotion-inducing effects of 5-HT are well documented. While the fact that even non-locomotory motor activity may be associated with 5-HT rise is certainly interesting (although not sure if 'puzzling'), the manuscript does not directly compare the association of 5-HT levels with locomotory and non-locomotory EMG spikes. Thus, I think this discussion point is not fully warranted.

      We agree and have rephrased the discussion point in question to reflect that the EMG link to serotonin oscillations is not necessarily surprising, given both the literature linking 5-HT and spontaneous movement in the hippocampus, as well as the involvement of 5-HT in repetitive movements, where the role for a regularly-occurring oscillation is perhaps more intuitive.

      (8) Line 441: Reference #67 does not describe the use of fiber photometry.

      The reviewer is to correct to point out this typo, which has been now corrected. The reference in question should be 64, where fiber photometry experiments are described. For further clarity, we have changed our referencing scheme to include authors and years in in-text references.

      (9) In Figures 3E1-3, the phase has different bounds than in the other Figures in the manuscript (0:360 vs -180:180), this should be corrected for consistency.

      We agree and have made changes so that all figures have a phase range of -180 to 180°.

      References

      (1) Z. M Aghajan, G. Kreiman, I. Fried, Minute-scale periodicity of neuronal firing in the human entorhinal cortex. Cell Rep 42, 113271 (2023).

      (2) M.A. Bunin, R.M. Wightman (1998). Quantitative Evaluation of 5-Hydroxytryptamine (Serotonin) Neuronal Release and Uptake: An Investigation of Extrasynaptic Transmission. J. Neurosci. 18 (13) 4854-4860

      (3) F. Cazettes, D. Reato, J. P. Morais, A. Renart, Z. F. Mainen, Phasic Activation of Dorsal Raphe Serotonergic Neurons Increases Pupil Size. Curr Biol 31, 192-197.e4 (2021).

      (4) Cole SR, Voytek B. Brain Oscillations and the Importance of Waveform Shape. Trends Cogn Sci. 21(2):137-149 (2017).

      (5) F. Deng, et al., Improved green and red GRAB sensors for monitoring spatiotemporal serotonin release in vivo. Nat Methods 21, 692–702 (2024).

      (6) C. Dong, et al., Psychedelic-inspired drug discovery using an engineered biosensor. Cell 184, 2779-2792.e18 (2021).

      (7) A. Eban-Rothschild, L. Appelbaum, L. de Lecea, Neuronal Mechanisms for Sleep/Wake Regulation and Modulatory Drive. Neuropsychopharmacol. 43, 937–952 (2018).

      (8) M. S. Fanselow, H.-W. Dong, Are the dorsal and ventral hippocampus functionally distinct structures? Neuron 65, 7–19 (2010).

      (9) J. S. Farrell, E. Hwaun, B. Dudok, I. Soltesz, Neural and behavioural state switching during hippocampal dentate spikes. Nature 1–6 (2024). https://doi.org/10.1038/s41586-024-07192-8.

      (10) De Filippo, R., & Schmitz, D. (2024). Transcriptomic mapping of the 5-HT receptor landscape. Patterns (New York, N.Y.), 5(10), 101048.

      (11) M. J. Fisher, et al., Neural mechanisms of sensory gating: Insights from human and animal studies. NeuroImage 207, 116374 (2020).

      (12) P. Fries, D. Nikolić, W. Singer, The gamma cycle. Trends in Neurosciences 30, 309–316 (2007).

      (13) S. Gonzalo Cogno, et al., Minute-scale oscillatory sequences in medial entorhinal cortex. Nature 625, 338–344 (2024).

      (14) D. Gutierrez-Barragan, M. A. Basson, S. Panzeri, A. Gozzi, Infraslow State Fluctuations Govern Spontaneous fMRI Network Dynamics. Current Biology 29, 2295-2306.e5 (2019).

      (15) K. Helmbold, et al., Serotonergic modulation of resting state default mode network connectivity in healthy women. Amino Acids 48, 1109–1120 (2016).

      (16) B. Jarosiewicz, B. L. McNaughton, W. E. Skaggs, Hippocampal Population Activity during the Small-Amplitude Irregular Activity State in the Rat. J. Neurosci. 22, 1373–1384 (2002).

      (17) J. T. Jordan, The rodent hippocampus as a bilateral structure: A review of hemispheric lateralization. Hippocampus 30, 278–292 (2020).

      (18) T. Kato, et al., Oscillatory Population-Level Activity of Dorsal Raphe Serotonergic Neurons Is Inscribed in Sleep Structure. J. Neurosci. 42, 7244–7255 (2022).

      (19) M.A. Kim, H. S. Lee, B. Y. Lee, B. D. Waterhouse, Reciprocal connections between subdivisions of the dorsal raphe and the nuclear core of the locus coeruleus in the rat. Brain Research 1026, 56–67 (2004).

      (20) C. Kjaerby, et al., Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci 25, 1059–1070 (2022).

      (21) S. Lecci, et al., Coordinated infraslow neural and cardiac oscillations mark fragility and offline periods in mammalian sleep. Sci Adv 3, e1602026 (2017).

      (22) A. A. Liu, et al., A consensus statement on detection of hippocampal sharp wave ripples and differentiation from other fast oscillations. Nat Commun 13, 6000 (2022).

      (23) M.-T. Lo, P.-H. Tsai, P.-F. Lin, C. Lin, Y. L. Hsin, The nonlinear and nonstationary properties in eeg signals: probing the complex fluctuations by hilbert–huang transform. Adv. Adapt. Data Anal. 01, 461–482 (2009).

      (24) N. K. Logothetis, J. Pauls, M. Augath, T. Trinath, A. Oeltermann, Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001).

      (25) H. Lu, et al., Rat brains also have a default mode network. Proc Natl Acad Sci U S A 109, 3979–3984 (2012).

      (26) D. Mantini, M. G. Perrucci, C. Del Gratta, G. L. Romani, M. Corbetta, Electrophysiological signatures of resting state networks in the human brain. Proc Natl Acad Sci U S A 104, 13170– 13175 (2007).

      (27) J. E. Markowitz, et al., The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44-58.e17 (2018).

      (28) H. Miyawaki, Y. N. Billeh, K. Diba, Low Activity Microstates During Sleep. Sleep 40, zsx066 (2017).

      (29) B. Mlinar, A. Montalbano, L. Piszczek, C. Gross, R. Corradetti, Firing Properties of Genetically Identified Dorsal Raphe Serotonergic Neurons in Brain Slices. Front Cell Neurosci 10, 195 (2016).

      (30) A. Osorio-Forero, et al., Noradrenergic circuit control of non-REM sleep substates. Current Biology 31, 5009-5023.e7 (2021).

      (31) S. Panzeri, N. Brunel, N. K. Logothetis, C. Kayser, Sensory neural codes using multiplexed temporal scales. Trends in Neurosciences 33, 111–120 (2010).

      (32) M. E. Raichle, et al., A default mode of brain function. Proc Natl Acad Sci U S A 98, 676–682 (2001).

      (33) R. Srinath, S. Ray, Effect of amplitude correlations on coherence in the local field potential. J Neurophysiol 112, 741–751 (2014).

      (34) B. P. Staresina, J. Niediek, V. Borger, R. Surges, F. Mormann, How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nat Neurosci 26, 1429–1437 (2023).

      (35) C. W. Thomas, et al., Psilocin acutely alters sleep-wake architecture and cortical brain activity in laboratory mice. Transl Psychiatry 12, 77 (2022).

      (36) G. F. Turi, et al., Serotonin modulates infraslow oscillation in the dentate gyrus during Non-REM sleep. eLife 13 (2025).

      (37) J. Vazquez, H. A. Baghdoyan, Basal forebrain acetylcholine release during REM sleep is significantly greater than during waking. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 280, R598–R601 (2001).

      (38) J. Wan, et al., A genetically encoded sensor for measuring serotonin dynamics. Nat Neurosci 24, 746–752 (2021).

      (39) Y. Zhang, et al., Cholinergic suppression of hippocampal sharp-wave ripples impairs working memory. Proc. Natl. Acad. Sci. U.S.A. 118, e2016432118 (2021).

      (40) Y. Zhang, et al., Interaction of acetylcholine and oxytocin neuromodulation in the hippocampus. Neuron (2024).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      We would like to thank Reviewer 1 for recognising the importance of our findings on the heterogeneity in bacterial responses to tachyplesin.

      (1) A double deletion of acrA and tolC (two out of the three components of the major constitutive RND efflux pump) reduces the appearance of the low accumulator phenotype, but interestingly, the single deletions have no effect, and a well-characterised inhibitor of RND efflux pumps also has no effect. The authors identify a two-component system, qseCB, that appears necessary for the appearance of low accumulators, but this system has pleiotropic effects on many cellular systems, with only tenuous connections to efflux. The selected pharmacological agents that could prevent the appearance of low accumulators do not offer clear insight into the mechanism by which low accumulators arise, because they have diverse modes of action.

      We have added that “QseBC, was previously inferred to mediate resistance to a tachyplesin analogue by upregulating efflux genes based on transcriptomic analysis and hyper susceptibility of ΔqseBΔqseC mutants[113]”. However, we have also acknowledged that “it is conceivable that the deletion of QseBC has pleiotropic effects on other cellular mechanisms involved in tachyplesin accumulation.” and that “it is also conceivable that sertraline prevented the formation of the low accumulator phenotype via efflux independent mechanisms”

      These amendments are reported on lines 525-527, 532-534 and 539-541 of our revised manuscript.

      (2) The transcriptomics data collected for low and high accumulator sub-populations are interesting, but in my opinion, the conclusions that can be drawn from these data remain overstated. It is not possible to make any claims about the total amount of "protein synthesis, energy production, and gene expression" on the basis of RNA-Seq data. The reads from each sample are normalised, so there is no information about the total amount of transcript. Many elements of total cellular activity are post-transcriptionally regulated, so it is impossible to assess from transcriptomics alone. Finally, the transcriptomic data are analysed in aggregated clusters of genes that are enriched for biological processes, for example: "Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators". However, this obscures the fact that these clusters include genes that are generally inhibitory of the process named, as well as genes that facilitate the process.

      We have now acknowledged that “that our data do not take into account post-transcriptional modifications that represent a second control point to survive external stressors.”

      These amendments are reported on lines 534-535 of our revised manuscript.

      The raw transcript counts can be found in Figure 3 – Source Data, we had added these data in our previous manuscript as requested by this reviewer.

      We would also like to clarify that we have analysed our transcriptomic data via both clustering (i.e. Figure 3) and direct comparison of genes of interest (Table S1) and transcription factors (i.e. genes that are generally inhibitory of the process named, as well as genes that facilitate the process, Figure S12).

      Finally, we would like to point out that in our revised manuscript (both this and its previous version) we are stating “Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators”. We do not think this is an overstatement, we do not use these data to make conclusions on the total amount of "protein synthesis, energy production, and gene expression".

      (3) The authors have added an experiment to attempt to assess overall metabolic activity in the low accumulator and high accumulator populations, which is a welcome addition. They apply the redox dye resazurin and observe lower resorufin (reduced form) fluorescence in the low accumulator population, which they take to indicate a lower respiration rate. This seems possible, however, an important caveat is that they have shown the low accumulator population to retain substantially lower amounts of multiple different fluorescent molecules (tachyplesin-NBD, propidium iodide, ethidium bromide) intracellularly compared to the high accumulator population. It seems possible that the low accumulator population is also capable of removing resazurin or resorufin from the intracellular space, regardless of metabolic rate. Indeed, it has previously been shown that efflux by RND efflux pumps influences resazurin reduction to resorufin in both P. aeruginosa and E. coli. By measuring only the retained redox dye using flow cytometry, the results may be confounded by the demonstrated ability of the low accumulator population to remove various fluorescent dyes. More work is needed to strongly support broad conclusions about the physiological states of the low and high accumulator populations. The phenomenon of the emergence of low accumulators, which are phenotypically tolerant to the antimicrobial peptide tachyplesin, is interesting and important even if there is still work to be done to understand the mechanism by which it occurs.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

      Reviewer 2:

      We would like to thank the reviewer for recognising that all their previous comments have now been satisfactorily addressed.

      (1) Some mechanistic questions regarding tachyplesin-accumulation and survival remain. One general shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´ cells. As the authors state themselves, this makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern of if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we had explicitly acknowledged this possibility on lines 281-285 (of the previous and current version of this manuscript).

      (2) The statement ´ Moreover, we found that the fluorescence of low accumulators decreased over time when bacteria were treated with 20 μg mL´ is, in my opinion, not supported by the data shown in Figure S4C. That figure shows that the abundance of ´low accumulator´ cells decreases over time. Following the rationale that protease K treatment may cleave surface associated/ extracellular tachyplesin-NDB, this should lead to a shift of ´low accumulator´ population to the left, indicating reduced fluorescence intensity per cell. This is not so case, but the population just disappears. However, after 120 min of treatment more cells appear in the ´high accumulator´ state. This result is somewhat puzzling.

      We agree with the reviewer that our previous discussion of this data could have been misleading. We have now reworded this part of the text as following: “We found that the fluorescence of high accumulators did not decrease over time when tachyplesin-NBD was removed from the extracellular environment and bacteria were treated with 20 μg mL<sup>-1</sup> (0.7 μM) proteinase K, a widely-occurring serine protease that can cleave the peptide bonds of AMPs [43–45] (Figure S4B and C). These data suggest that tachyplesin-NBD primarily accumulates intracellularly in high accumulators.”

      It is conceivable that extended exposure to proteinase K (i.e. we see a decrease in the abundance of low accumulators after 90 min treatment with proteinase K) increased the permeability to tachyplesin-NBD of low accumulators allowing tachyplesin-NBD to move from either the extracellular space or the membrane to the cell interior. However, we do not have data to prove this point.

      Therefore, we have now removed our claim that the data obtained using proteinase K suggest that tachyplesin-NBD accumulates primarily in the membranes of low accumulators. We believe that our two separate microscopy analyses provide more direct, stronger and less ambiguous evidence that tachyplesin-NBD accumulates primarily in the membranes of low accumulators.

      (3) The authors used the metabolic dye resazurin to measure the metabolic activity of low vs. high accumulators. I am not entirely convinced that the lower fluorescence resorufin fluorescence in tachyplesin-NBD accumulators really indicates lower metabolic activity, since a cell's fluorescence levels would also be affected by the cellular uptake and efflux. It appears plausible that the lower resorufin-fluorescence may result from reduced accumulation/increased efflux in the ‘low-tachyplesin NBD´ population.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      (4) P8 line 343. The text should refer to Figure. 13B, instead of 14B

      We have now changed the text accordingly on line 337.

      Reviewer 3:

      We would like to thank the reviewer for recognising that we have done a very impressive job in taking care of their comments.

      (1) Despite these advances, the contribution of efflux may require more direct evidence to further dissect whether efflux is necessary, sufficient, or contributory. The facts that the key low efflux mutant still retains a small fraction of survivors and that the inhibitors used may cause other physiological changes leading to higher efflux are still unaccounted for. The lipidomic and vesicle findings, while intriguing, remain descriptive, and direct tests of their functional relevance would further solidify the mechanistic models.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reports the development of a novel organoid system for studying the emergence of autorhythmic gut peristaltic contractions through the interaction between interstitial cells of Cajal and smooth muscle cells. While the utility of the organoids for studying hindgut development is well illustrated by showing, for example, a previously unappreciated potential role for smooth muscle cells in regulating the firing rate of interstitial cells of Cajal, some of the functional analyses are incomplete. There are some concerns about the specificity and penetrance of perturbations and the reproducibility of the phenotypes. With these concerns properly addressed, this paper will be of interest to those studying the development and physiology of the gut.

      We greatly appreciate constructive comments raised by the Editors and all the Reviewers. We have newly conducted pharmacological experiments using Nifedipine, a L-type Ca<sup>2+</sup> blocker known to operate in smooth muscles (new Fig 7). The treatment abrogated not only the oscillation of SMCs but also that in ICCs, further corroborating our model that not only ICC-to-SMC interactions but also the reverse direction, namely SMC-to-ICC feedback signals, are operating to achieve coordinated/stable rhythm of gut contractile organoids.

      Concerning the issues of the specificity and penetrance in pharmacological experiments with gap junction inhibitors, we have carefully re-examined effects by multiple blockers (CBX and 18b-GA) at different concentrations (new Fig 5D and Fig. S3B).We have newly found that: (1) the effects observed by CBX (100 µM) that the latency of Ca<sup>2+</sup> peaks between ICCs (preceding) and SMCs (following) was abolished are not seen by 18b-GA at any concentrations including 100 µM, implying that the latency of Ca<sup>2+</sup> peaks between these cells is governed by connexin(s) that are not inhibited by18bGA. Such difference in inhibiting effects by these two drugs were previously reported in multiple model systems including guts (Daniel et al., 2007; Parsons & Huizinga, 2015; Schultz et al., 2003).

      Regarding the penetrance of the drugs, we have carried out earlier administration (Day 3) of the gap junction inhibitor, either CBX (100 µM) or 18b-GA (100 µM), in the course of organoidal formation in culture when cells are still at 2D to exclude a possible penetrance problem (new Fig. S3C). There treatments render no or little effects to the patterns of organoidal contractions in a way similar to the drug administration at Day 7. As already shown in the first version, CBX (100 µM) eliminates the latency of Ca<sup>2+</sup> peaks, we believe that this drug successfully penetrates into the organoid and exerts its specific effects.

      Unfortunately, due to very unstable condition in climate including extreme heat and sporadically occurring bird flu epidemic since the last summer in Japan, the poultry farm must have faced problems. In the course of revision experiments, we got in a serious trouble at multiple times with unhealthy eggs/embryos lasting from last summer until present. These unfortunate incidents did not allow us to engage in the revision experiments as fully as we originally planned. Nevertheless, we did our very best within a limited time fame, and we believe that the revised version is suitable as a final version of an eLife article.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed an organoid system that contains smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs; pacemaker) but few enteric neurons, and generates rhythmic contractions as seen in the developing gut. The stereotypical arrangements of SMCs and ICCs in the organoid allowed the authors to identify these cell types in the organoid without antibody staining. The authors took advantage of this and used calcium imaging and pharmacology to study how calcium transients develop in this system through the interaction between the two types of cells. The authors first show that calcium transients are synchronized between ICC-ICC, SMC-SMC, and SMC-ICC. They then used gap junction inhibitors to suggest that gap junctions are specifically involved in ICC-to-SMC signaling. Finally, the authors used an inhibitor of myosin II to suggest that feedback from SMC contraction is crucial for the generation of rhythmic activities in ICCs. The authors also show that two organoids become synchronized as they fuse and SMCs mediate this synchronization.

      Strengths:

      The organoid system offers a useful model in which one can study the specific roles of SMCs and ICCs in live samples.

      Thank you very much for the constructive comments.

      Weaknesses:

      Since only one blocker each for gap junction and myosin II was used, the specificities of the effects were unclear.

      We appreciate these comments. We have addressed those of “weaknesses” as described in “Responses to the eLife assessment” (please see above).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Yagasaki et al. describe an organoid system to study the interactions between smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs). While these interactions are essential for the control of rhythmic intestinal contractility (i.e., peristalsis), they are poorly understood, largely due to the complexity of and access to the in vivo environment and the inability to co-culture these cell types in vitro for long term under physiological conditions. The "gut contractile organoids" organoids described herein are reconstituted from stromal cells of the fetal chicken hindgut that rapidly reorganize into multilayered spheroids containing an outer layer of smooth muscle cells and an inner core of interstitial cells. The authors demonstrate that they contract cyclically and additionally use calcium imagining to show that these contractions occur concomitantly with calcium transients that initiate in the interstitial cell core and are synchronized within the organoid and between ICCs and SMCs. Furthermore, they use several pharmacological inhibitors to show that these contractions are dependent upon non-muscle myosin activity and, surprisingly, independent of gap junction activity. Finally, they develop a 3D hydrogel for the culturing of multiple organoids and found that they synchronize their contractile activities through interconnecting smooth muscle cells, suggesting that this model can be used to study the emergence of pacemaking activities. Overall, this study provides a relatively easy-to-establish organoid system that will be of use in studies examining the emergence of rhythmic peristaltic smooth muscle contractions and how these are regulated by interstitial cell interactions. However, further validation and quantification will be necessary to conclusively determine show the cellular composition of the organoids and how reproducible their behaviors are.

      Strengths:

      This work establishes a new self-organizing organoid system that can easily be generated from the muscle layers of the chick fetal hindgut to study the emergence of spontaneous smooth muscle cell contractility. A key strength of this approach is that the organoids seem to contain few cell types (though more validation is needed), namely smooth muscle cells (SMCs) and interstitial cells of Cajal (ICCs). These organoids are amenable to live imaging of calcium dynamics as well as pharmacological perturbations for functional assays, and since they are derived from developing tissues, the emergence of the interactions between cell types can be functionally studied. Thus, the gut contractile organoids represent a reductionist system to study the interactions between SMCs and ICCs in comparison to the more complex in vivo environment, which has made studying these interactions challenging.

      Thank you very much for the constructive comments.

      Weaknesses:

      The study falls short in the sense that it does not provide a rigorous amount of evidence to validate that the gut organoids are made of bona fide smooth muscle cells and ICCs. For example, only two "marker" proteins are used to support the claims of cell identity of SMCs and ICCs. At the same time, certain aspects of the data are not quantified sufficiently to appreciate the variance of organoid rhythmic contractility. For example, most contractility plots show the trace for a single organoid. This leads to a concern for how reproducible certain aspects of the organoid system (e.g. wavelength between contractions/rhythm) might be, or how these evolve uniquely over time in culture. Furthermore, while this study might be able to capture the emergence of ICC-SMC interactions as they related to muscle contraction and pacemaking, it is unclear how these interactions relate to adult gastrointestinal physiology given that the organoids are derived from fetal cells that might not be fully differentiated or might have distinct functions from the adult. Finally, despite the strength of this system, discoveries made in it will need to be validated in vivo. Thank you very much for the comments, which are helpful to improve our MS. In the revised version, we have additionally used antibody against desmin, known to be a maker for mature SMCs (new Fig 3B). The signal is seen only in the peripheral cells overlapping with the αSMA staining (line 169-170).

      Concerning the reproducibility, while contractility changes were shown for a representative organoid in the original version, experiments had been carried out multiple times, and consistent data were reproduced as already mentioned in the text of the first version of MS. However, we agree with this reviewer that it must be more convincing if we assess quantitatively. We have therefore conducted quantitative assessments of organoidal contractions and Ca<sup>2+</sup> transients (new Fig. 2B, new Fig. 4D, new Fig 5D, E, new Fig. 6B, new Fig. 7B, new Fig. 8C, new Fig. S2, S3). Details such as repeats of experiments and size of specimens are carefully described in the revised version (Figure legends)

      In particular, in place of contraction numbers/time, we have plotted “contraction intervals” between two successive peaks (Fig. 2B and others). Actually, with your suggestion, we have tried to perform a periodicity analysis of organoid contractions. Unfortunately, no clear value has been obtained, probably because the contractions/Ca<sup>2+</sup> transitions are not as “regularly periodical” as seen in conventional physics. This led us to perform the peak-interval analysis. Methods to quantify the contraction intervals are carefully explained in the revised version.

      As already mentioned in the “Our provisional responses” following the receipt of Reviewers’ comments, we agree that our organoids derived from embryonic hind gut (E15) might not necessarily recapitulate the full function of cells in adult. However, it has well been accepted in the field of developmental biology that studies with embryonic tissue/cells make a huge contribution to unveil complicated physiological cell functions. Nevertheless, we have carefully considered in the revised version so that the MS would not send misleading messages. We agree that in vivo validation of our gut contractile organoid must be wonderful, and this is a next step to go.

      Reviewer #3 (Public Review):

      Summary:

      The paper presents a novel contractile gut organoid system that allows for in vitro studying of rudimentary peristaltic motions in embryonic tissues by facilitating GCaMPlive imaging of Ca<sup>2+</sup> dynamics, while highlighting the importance and sufficiency of ICC and SMC interactions in generating consistent contractions reminiscent of peristalsis. It also argues that ENS at later embryonic stages might not be necessary for coordination of peristalsis.

      Strengths:

      The manuscript by Yagasaki, Takahashi, and colleagues represents an exciting new addition to the toolkit available for studying fundamental questions in the development and physiology of the hindgut. The authors carefully lay out the protocol for generating contractile gut organoids from chick embryonic hindgut, and perform a series of experiments that illustrate the broader utility of these organoids for studying the gut. This reviewer is highly supportive of the manuscript, with only minor requests to improve confidence in the findings and broader impact of the work. These are detailed below.

      Thank you very much for the constructive comments.

      Weaknesses:

      (1) Given that the literature is conflicting on the role GAP junctions in potentiating communication between intestinal cells of Cajal (ICCs) and smooth muscle cells (SMCs), the experiments involving CBX and 18Beta-GA are well-justified. However, because neither treatment altered contractile frequency or synchronization of Ca++ transients, it would be important to demonstrate that the treatments did indeed inhibit GAP junction function as administered. This would strengthen the conclusion that GAP junctions are not required, and eliminate the alternative explanation that the treatments themselves failed to block GAP junction activity.

      Thank you for these comments, and we agree. In the revised version, we have verified the drugs, CBX and 18b-GA, using dissociated embryonic heart cells in culture, a well-established model for the gap junction study (new Fig. S3D, line 237-239). Expectedly, both inhibitors abrogate the rhythmic beats of heart cells, and importantly, cells’ beats resume after wash-out of the drug.

      (2) Given that 5uM blebbistatin increases the frequency of contractions but 10uM completely abolishes contractions, confirming that cell viability is not compromised at the higher concentration would build confidence that the phenotype results from inhibition of myosin activity. One could either assay for cell death, or perform washout experiments to test for recovery of cyclic contractions upon removal of blebbistatin. The latter may provide access to other interesting questions as well. For example, do organoids retain memory of their prior setpoint or arrive at a new firing frequency after washout?

      We greatly appreciate these suggestions and also interesting ideas to explore! In the revised version, we have newly conducted washout experiments (new Fig. 6B) (10 µM drug is washed-out from culture medium), and found that contractions resume, showing that cell viability is not compromised at 10 µM concentration (line 257-259). Intriguingly, the resumed rhythm appears more regular than that before drug administration. Thus, the contraction rhythm of the organoid might be determined by cellcell interactions at any given time rather than by memory of their prior setpoint. This is an interesting issue we would like to further explore in the future. These issues, although potentially interesting, are not mentioned in the text of the revised version, since it is too early to interpret there observations.

      (3) Regulation of contractile activity was attributed to ICCs, with authors reasoning that Tuj1+ enteric neurons were only present in organoids in very small numbers (~1%).

      However, neuronal function is not strictly dependent on abundance, and some experimental support for the relative importance of ICCs over Tuj1+ cells would strengthen a central assumption of the work that ICCs the predominant cell type regulating organoid contraction. For example, one could envision forming organoids from embryos in which neural crest cells have been ablated via microdissection or targeted electroporation. Another approach would be ablation of Tuj1+ cells from the formed organoids via tetrodotoxin treatment. The ability of organoids to maintain rhythmic contractile activity in the total absence of Tuj1+ cells would add confidence that the ICCs are indeed the driver of contractility in these organoids.

      We agree. In the revised version, we have conducted TTX administration (new Fig. S2C). Changes in contractility by this treatment is not detected, supporting the argument that neural cells/activities are not essential for rhythmic contractions of the organoid (line 178-181).

      (4) Given the implications of a time lag between Ca++ peaks in ICCs and SMCs, it would be important to quantify this, including standard deviations, rather than showing representative plots from a single sample.

      In the revised version, we have elaborated a series of quantitative assessments as mentioned above (please see our responses to the “eLife assessments” at the beginning of these correspondences). The latency between Ca<sup>2+</sup> peaks in ICCs and SMCs is shown in new Fig. 4D, in which measured value is 700 msec-terraced since the time-lapse imaging was performed with 700 msec intervals (as already described in the first version).

      117 peaks for 14 organoids have been assessed (line 218).

      (5) To validate the organoid as a faithful recreation of in vivo conditions, it would be helpful for authors to test some of the more exciting findings on explanted hindgut tissue. One could explant hindguts and test whether blebbistatin treatment silences peristaltic contractions as it does in organoids, or following RCAS-GCAMP infection at earlier stages, one could test the effects of GAP junction inhibitors on Ca++ transients in explanted hindguts. These would potentially serve as useful validation for the gut contractile organoid, and further emphasize the utility of studying these simplified systems for understanding more complex phenomena in vivo.

      Thank you very much for insightful comments. We would love to explore these issues in near future. Just a note is that it was previously reported that Nifedipine silences peristaltic contractions in ex-vivo cultured gut (Chevalier et al., 2024; Der et al., 2000).

      (6) Organoid fusion experiments are very interesting. It appears that immediately after fusion, the contraction frequency is markedly reduced. Authors should comment on this, and how it changes over time following fusion. Further, is there a relationship between aggregate size and contractile frequency? There are many interesting points that could be discussed here, even if experimental investigation of these points is left to future work.

      It would indeed be interesting to explore how cell communications affect/determine the contraction rhythm, and our novel organoids must serve as an excellent model to address these fundamental questions. We have observed multiple times that when two organoids fuse, they undergo “pause”, and resume coordinated contractions as a whole, and we have mentioned such notice briefly in the revised version (line 282). To know what is going on during this pause time should be tempting. In addition, we have an impression that the larger in size organoids grow, the slower rhythm they count. We would love to explore this in near future.

      (7) Minor: As seen in Movie 6 and Figure 6A, 5uM blebbistatin causes a remarkable increase in the frequency of contractions. Given the regular periodicity of these contractions, it is a surprising and potentially interesting finding, but authors do not comment on it. It would be helpful to note this disparity between 5 and 10 uM treatments, if not to speculate on what it means, even if it is beyond the scope of the present study to understand this further.

      We assume that the increase in the frequency of contractions at 5 µM might be due to a shorter refractory period caused by a decreasing magnitude (amplitude) of contraction. We have made a short description in the revised text (line 256-257).

      (8) Minor: While ENS cells are limited in the organoid, it would be helpful to quantify the number of SMCs for comparison in Supplemental Figure S2. In several images, the number of SMCs appears quite limited as well, and the comparison would lend context and a point of reference for the data presented in Figure S2B.

      In the revised version, the number of SMCs has been counted and added in Fig. S2B. Contrary to that SMCs are more abundant than ICCs in an intact gut, the proportion is reversed in our organoid (line 181-183). It might due to treatments during cell dissociation/plating.

      (9) Minor: additional details in the Figure 8 legend would improve interpretation of these results. For example, what is indicated in orange signal present in panels C, G and H? Is this GCAMP?

      We apologize for this confusion. In the revised version, we have added labeling directly in the photos of new Fig. 9 (old Fig. 8). For C, G and H, the left photo is mRuby3+GCaMP6s, and the right one is GCaMP6s only.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments for the authors to consider:

      (1) Figure 4C: The authors propose that calcium signals propagate from ICC to SMC based on the results presented in this figure. While it is observed that the peak of the calcium signal in ICC precedes that in SMC, it's worth noting that the onset of the rise in calcium signals occurs simultaneously in ICC and SMC. Doesn't this suggest that they are activated simultaneously? The latency observed for the peaks of calcium signals could reflect different kinetics of the rise in calcium concentration in the two types of cells rather than the order of calcium signal propagation.

      We greatly appreciate these comments. We have re-examined kinetics of GCaMP signals in ICC and SMC, but we did not succeed in validating rise points precisely. We agree that the possibility that the rise in calcium signals could be occurring simultaneously. To clarify these issues, analyses with higher resolution is required, such as using GCaMP6f or GCaMP7/8. Nevertheless, the disappearance of the latency of Ca<sup>2+</sup> peak by CBX implies a role of gap junction in ICC to SMC signaling. In the revised version, we replaced the wording “rise” by “peak” when the latency is discussed.

      (2) Figure 5C: The specific elimination of the latency in the calcium signal peaks between ICC and SMC is interesting. However, I am curious about how gap junction inhibitors specifically eliminate the latency between ICC and SMC without affecting other aspects of calcium transients in these cells, such as amplitude and synchronization among ICCs and/or SMCs. Readers of the manuscript would expect some discussion on possible mechanisms underlying this specificity. Additionally, I wonder if the elimination of the latency was observed consistently across all samples examined. The authors should provide information on the frequency and number of samples examined, and whether the elimination occurs when 18-beta-GA is used.

      In the revised version, we have elaborated quantitative demonstration. For the effects by CBX on latency or Ca<sup>2+</sup> peaks, a new graph has been added to new Fig 5, in which 100 µM eliminated the latency. Intriguingly, the latency appears to be attributed to a gap junction that is not inhibited by18-beta-GA (please see new Fig. S3E). As already mentioned above, inhibiting activity of both CBX and 18-beta-GA has been verified using dissociated cells of embryonic heart, a popular model for gap junction studies.

      At present, we do not know how gap junction(s) contribute to the latency of Ca<sup>2+</sup> peaks without affecting synchronization among ICCs and/or SMCs (we have not addressed amplitude of the oscillation in this study). Actually, it was surprising to us to find that GJ’s contribution is very limited. We do not exclude the importance of GJs, and currently speculate that GJs might be important for the initiation of contraction/oscillation signals, whereas the requirement of GJs diminishes once the ICC-SMC interacting rhythm is established. What we observed in this study might be the synchronization signals AFTER these interactions are established (Day 7 of organoidal culture). Upon the establishment, it is possible that mechanical signaling elicited by smooth muscles’ contraction might become prominent as a mediator for the (stable) synchronization, as implicated by experiments with blebbistatin and Nifedipin, the latter being newly added to the revised version (new Fig. 7). We have added such speculation, although briefly in Discussion (line 374-377)

      (3) Figure 6: The significant effects of blebbistatin on calcium dynamics in both ICC and SMC are intriguing. However, since only one blocker is utilized, the specificity of the effects is unclear. If other blockers for muscle contraction are available, they should be employed. Considering that a rise in calcium concentration precedes contraction, calcium transients should persist even if muscle contraction is inhibited. One concern is whether blebbistatin inadvertently rendered the cells unhealthy. The authors should demonstrate at least that contraction and calcium transients recover after removal of the drug. The frequency and number of samples examined should be shown, as requested for Figure 5C above.

      Thank you for these critical comments. A possible harmfulness of the drugs was also raised by other reviewers, and we have therefore conducted wash-out experiments in the revised version (new Fig. 6B). Contractions resume after wash-out showing that cell viability is not compromised at 10 µM concentration. The number of samples examined has been described more explicitly in the revised version. Regarding the blocker of SMC, we have newly carried out pharmacological assays using nifedipine, a blocker of a L-type Ca<sup>2+</sup> channel known to operate in smooth muscle cells (new Fig 7) (Chevalier et al., 2024; Der et al., 2000). As already explained in the “Responses to eLife assessment”, the treatment abrogated ICCs’ rhythm and synchronous Ca<sup>2+</sup> transients between ICCs and SMCs, further corroborating our model that not only ICC-to-SMC interactions but also SMC-to-ICC feedback signals are operating to achieve coordinated/stable rhythm of gut contractile organoids of Day 7 culture (please also see our responses shown above for Comment (2)).

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (1) The claim that organoids contain functional SMCs and ICCs is insufficient as it currently relies on only c-Kit and aSMA antibodies. This conclusion could be additionally supported by staining with other markers of contractile smooth muscle (e.g. TAGLN and MYH14) and an additional accepted marker of ICCs (e.g. ANO1/TMEM16). Moreover, it should be demonstrated whether these cells are PDGFRA+, as PDGFRA is a known marker of other mesenchymal fibroblast cell types. These experiments would additionally rule out whether these cells were simply less differentiated myofibroblasts. Given that there might not be available antibodies that react with chicken protein versions, the authors could support their conclusions using alternative approaches, such as fluorescent in situ hybridization. A more thorough approach, such as single-cell RNA sequencing to compare the cell composition of the in vitro organoids to the in vivo colon, would fully justify the use of these organoids as a system for studying in vivo cell physiology.

      With these suggestions provided, we have newly stained contractile organoids with anti-desmin antibody, known to be a marker for differentiated SMCs. As shown in new Fig. 3B, desmin-positive cells perfectly overlapped with aSMA-staining, indicating that the peripherally enclosing cells are SMCs. Regarding the interior cells, as this Reviewer concerned, there are no antibodies against ANO1/TMEM16 which are available for avian specimens. The anti- c-Kit antibody used in this study is what we raised in our hands by spending years (Yagasaki et al., 2021)), in which the antibody was carefully validated in intact guts of chicken embryos by multiple methods including Western Blot analyses, immunostaining, and in situ hybridization. We have attempted several times to perform organoidal whole-mount in situ hybridization for expression of PDGFRα, but we have not succeeded so far. In addition, as explained to the Editor, the very unhealthy condition of purchased eggs these past 7 months did not allow us to continue any further. We are planning to interrogate cell types residing in the central area of the organoid, results of which will be reported in a separate paper in near future.

      (2) The key ICC-SMC relationship and physiological interaction seems to arise developmentally, but the mechanisms of this transition are not well defined (Chevalier 2020). To further support the claim that ICC-SMC interactions can be interrogated in this system, this study would benefit from establishing organoids at distinct developmental stages to (a) show that they have unique contractile profiles, and (b) demonstrate that they evolve over time in vitro toward an ICC-driven mechanism.

      We agree with these comments. We tried to prepare gut contractile organoids derived from different stages of development, and we had an impression that slightly younger hindguts are available for the organoid preparations. In addition, not only the hindgut, but also midgut and caecum also yield organoids. However, since formed organoids derived from these “non-E15 hindgut” vary substantially in shapes, contraction frequencies/amplitudes etc., we are currently not ready to report these preliminary observations. Instead, we decided to optimize and elaborate in vitro culture conditions by focusing on the E15 hindgut, which turned out to be most stable in our hands. Nevertheless, it is tempting to see how organoid evolves over time during gut development.

      (3) This manuscript would be greatly enhanced by a functional examination of the prospective organoid ICCs. For example, the authors could test whether the c-Kit inhibitor Imatinib, which has previously been used to impair ICC differentiation and function in the developing chick gut (Chevalier 2020), has an effect on contractility at different stages.

      Following the paper of (Chevalier 2020), we had already conducted similar experiments with Imatinib in the culture with our organoids, but we did not see detectable effects. In that paper, the midgut of younger embryos was used, whereas we used E15 hindgut to prepare organoids. It would be interesting to see if we add Imanitib earlier during organoidal formation, and this is a next step to go.

      (4) It is claimed that there is a 690s msec delay in SMC spike relative to ICC spike, however, it is unclear where this average is derived from and whether the organoid calcium trace shown in Figure 4C is representative of the data. The latency quantification should be shown across multiple organoids, and again in the case of carbenoxolone treatment, to better understand the variations in treatment.

      We apologize that the first version failed to clearly demonstrate quantitative assessments. In the revised version, we have elaborated quantitative assessments (117 peaks for 14 organoids) (line 216-218). In new Fig. 4D, measured value is 700 msecterraced since as already mentioned in the first version, the time-lapse imaging was performed with 700 msec intervals.

      (5) As above, a larger issue is that only single traces are shown for each organoid. This makes it challenging to understand the variance in contractile properties across multiple organoids. While contraction frequencies are shown several times, the manuscript would benefit from additional quantifications, such as rhythm (average wavelength between events) in control and perturbed conditions.

      We have substantially elaborated quantitative assessments (please also see our responses to the “Public Review”). In particular, in place of contraction numbers/time, we have plotted “contraction intervals” between two successive peaks (Fig. 2B and others). Actually, we have tried to perform a periodicity analysis of organoid contractions. Unfortunately, no clear value has been obtained, probably because the contractions/Ca<sup>2+</sup> transitions are not as “regularly periodical” as seen in conventional physics. This led us to perform the peak-interval analysis. Methods to quantify the contraction intervals are carefully explained in the revised version.

      (6) The synchronicity observed between ICCs and SMCs within the organoid is interesting, and should be emphasized by making analyses more quantitative so as to understand how consistent and reproducible this phenomenon is across organoids. Moreover, one of the most exciting parts of the study is the synchronicity established between organoids in the hydrogel system, but it is insufficiently quantified. For example, how rapidly is pacemaking synchronization achieved?

      As we replied above to (5), and described in the responses to the “Public Review”, we have substantially elaborated quantitative assessments in the revised version. Concerning the synchronicity between ICCs and SMCs, our data explicitly show that as long as the organoid undergoes healthy contraction, they perfectly match their rhythm (Fig. 4) making it difficult to display quantitatively. Instead, to demonstrate such synchronicity more convincingly, we have carefully described the number of peaks and the number of independent organoids we analyzed in each of Figure legends. In the experiments with hydrogels, the time required for two organoids to start/resume synchronous contraction varies greatly. For example, for the experiment shown in new Fig 9F, it takes 1 day to 2 days for cells crawling out of organoids and cover the surface of the hydrogel. In the experiments shown in new Fig. 8, two organoids undergo “pause” before resuming contractions. In the revised version, we have briefly mentioned our notice and speculation that active cell communications take place during this pausing time, (line 282-283 in Result and line 437-439 in Discussion). We agree with this reviewer saying that the pausing time is potentially very interesting. However, it is currently difficult to quantify these phenomena. More elaborate experimental design might be needed.

      (7) Smooth muscle layers in vivo are well organized into circular and longitudinal layers. To establish physiological relevance, the authors should demonstrate if these organoids have multiple layers (though it looks like just a single outer layer) and if they show supracellular organization across the organoid.

      The immunostaining data suggest that peripherally lining cells are of a single layer, and we assume that they might be aligned in register with contracting direction. However, to clarify these issues, observation with higher resolution would be required.

      (8) To further examine whether the organoids contain true functional ICCs, the authors should test whether their calcium transients are impacted by inhibitors of L-type calcium channels, such as nifedipine and nicardipine. These channels have been demonstrated to be important for SMCs but not ICCs, so one might expect to see continued transients in the core ICCs but a loss of them in SMCs (Lee et al., 1999; PMID: 10444456)

      We appreciate these comments. We have accordingly conducted new experiments with Nifedipine. Contrary to the expectation, Nifedipine ceases not only organoidal contractions, but also ICC activities (and its resulting synchronization) (new Fig. 7). These findings actually corroborate our model already mentioned in the first version that ICCs receive mechanical feedback from SMC’s contraction to stably maintain their oscillatory rhythm. We believe that the additional findings with Nifedipine have improved the quality of our paper. Concerning the central cells in the organoid, we have additionally used anti-desmin antibody known to mark differentiated SMCs. Desmin signals perfectly overlap with those of aSMA in the peripheral single layer, supporting that the peripheral cells are SMCs and central cells are ICCs. The anti c-Kit antibody used in this study is what we raised in our hands by spending years (Yagasaki et al., 2021)), in which the antibody was carefully validated in intact guts of chicken embryos by multiple methods including Western Blot analyses, immunostaining, and in situ hybridization.

      ANO1/TMEM16 are known to stain ICCs in mice. Antibodies against ANO1/TMEM16 available for avian specimens are awaited.

      (9) Despite Tuj1+ enteric neurons only making up a small fraction of the organoids, the authors should still functionally test whether they regulate any aspect of contractility by treating organoids with an inhibitor such as tetrodotoxin to rule out a role for them.

      Thank you for these advices, which are also raised by other reviewers. We have conducted TTX administration (new Fig. S2C). Changes in contractility by this treatment is not detected, supporting the argument that neural cells/activities are not essential for rhythmic contractions of the organoid (line 178-181).

      (10) Finally, the manuscript is written to suggest that the focus of the study is to establish a system to interrogate ICC-SMC interactions in gut physiology and peristalsis. However, the organoids designed in this study are derived from the fetal precursors to the adult cell types. Thus, they might not accurately portray the adult cell physiology. I don't believe that this is a downfall, but rather a strength of the study that should be emphasized. That is, the focus could be shifted toward stressing the power of this new system as a reductionist, self-organizing model to examine the developmental emergence of contractile synchronization in the intestine - in particular that arising through ICC-SMC interactions.

      We appreciate these advices. In the revised MS, we are careful so that our findings do not necessarily portray the physiological functions in adult gut.

      Minor:

      More technical information could be used in the methods:

      (1) What concentration of Matrigel is used for coating, and what size were the wells that cells were deposited into?

      We have added, “14-mm diameter glass-bottom dishes (Matsunami, D11130H)” and “undiluted Matrigel (Corning, 354248) at 38.5°C for 20 min” (line 471473).

      (2) How were organoids transferred to the hydrogels? And were the hydrogels coated?

      We have added “Organoids were transferred to the hydrogel using a glass capillary” (line 560-561).

      (3) Tests for significance and p values should be added where appropriate (e.g. Figure S3B).

      We have added these in Figure legend of new Fig. S3.

      Reviewer #3 (Recommendations For The Authors):

      This is an exciting study, and while the majority of our comments are minor suggestions to improve the clarity and impact of findings, it would be important to verify the effective disruption of GAP junction function with CBX or 18Beta-GA treatments before concluding they are not required for coordination of contractility and initiation by ICCs. It is possible that sufficient contextual support exists in the literature for the nature of treatments used, but this may need to be conveyed within the manuscript to allay concerns that the results could be explained by ineffective inhibition of GAP junctions.

      Thank you very much for these advices. In the revised version, we have newly carried out experiments with dissociated embryonic heart cells cultured in vitro, a model widely used for gap junction studies (Fig. S3D). Both CBX or 18b-GA exert efficient inhibiting activity on contractions of heart cells. We have added the following sentence, “The inhibiting activity of the drugs used here was verified using embryonic heart culture (line 237-239)”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study aims to create a comprehensive repository about the changes in protein abundance and their modification during oocyte maturation in Xenopus laevis.

      Strengths:

      The results contribute meaningfully to the field.

      Weaknesses:

      The manuscript could have benefitted from more comprehensive analyses and clearer writing. Nonetheless, the key findings are robust and offer a valuable resource for the scientific community.

      We would like to thank the reviewer for his/her positive feedback on our article. The public review points out that "The manuscript could have benefitted from more comprehensive analyses and clearer writing." We have rewritten several sections and provided more detailed explanations of the analysis and interpretation of some data (see below for details). We have also followed all of the reviewer's recommendations, some of which specifically highlighted areas lacking clarity. We would also like to thank the reviewer for pointing out some errors, for which we apologize, and which have now been corrected. We sincerely appreciate the reviewer's thorough work, as it has greatly enhanced the clarity and precision of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors analyzed Xenopus oocytes at different stages of meiosis using quantitative phosphoproteomics. Their advanced methods and analyses revealed changes in protein abundances and phosphorylation states to an unprecedented depth and quantitative detail. In the manuscript they provide an excellent interpretation of these findings putting them in the context of past literature in Xenopus as well as in other model systems.

      Strengths:

      High quality data, careful and detailed analysis, outstanding interpretation in the context of the large body of the literature.

      Weaknesses:

      Merely a resource, none of the findings are tested in functional experiments.

      I am very impressed by the quality of the data and the careful and detailed interpretation of the findings. In this form the manuscript will be an excellent resource to the cell division community in general, and it presents a very large number of hypotheses that can be tested in future experiments. Xenopus has been and still is a popular and powerful model system that led to critical discoveries around countless cellular processes, including the spindle, nuclear envelope, translational regulation, just to name a few. This also includes a huge body of literature on the cell cycle describing its phosphoregulation. It is indeed somewhat frustrating to see that these earlier studies using phosphomutants and phospho-antibodies were just scratching the surface. The phosphoproteomics analysis presented here reveals much more extensive and much more dynamic changes in phosphorylation states. Thereby, in my opinion, this manuscript opens a completely new chapter in this line of research, setting the stage for more systematic future studies.

      We thank the reviewer for his/her extremely positive comments. The public review points out that "none of the findings are tested in functional experiments." This is entirely accurate. We focused our work on obtaining the highest quality proteomic and phosphoproteomic data possible, and then sought to highlight these data by connecting them with existing functional data from the literature. This approach has opened up research avenues with enormous, previously unforeseen potential, in a wide range of biological fields (cell cycle, meiosis, oogenesis, embryonic development, cell biology, cellular physiology, signaling, evolution, etc.). We chose not to delay publication by experimentally investigating the narrow area in which we are specialists (meiotic maturation), while our data offer a vast array of research opportunities across various fields. Our goal was, therefore, to present this extensive dataset as a resource for different scientific communities, who can explore their specific biological questions using our data. This is why we submitted our article to the "Repository" section of eLife. Nevertheless, in the context of the comparative analysis of the mouse and Xenopus phosphoproteomes performed at the reviewer’s request, we felt it was important to complement this new section with functional experiments that not only validate the proteomic data but also provide new insights into certain proteins and their regulation by Cdk1 (new paragraph lines 824-860 and new Figure 9).

      We are also grateful to the reviewer for the recommendation to improve the manuscript by including more comparisons between our Xenopus data and those from other systems. We have followed this suggestion (see below), which has significantly enriched the article (new paragraph lines 824-860 and new Figure 9).

      Reviewer #3 (Public review):

      Summary:

      The authors performed time-resolved proteomics and phospho-proteomics in Xenopus oocytes from prophase I through the MII arrest of the unfertilized egg. The data contains protein abundance and phosphorylation sites of a large number set of proteins at different stages of oocyte maturation. The large sets of the data are of high quality. In addition, the authors discussed several key pathways critical for the maturation. The data is very useful for the researchers not only researchers in Xenopus oocytes but also those in oocyte biology in other organisms.

      Strengths:

      The data of proteomics and phospho-proteomics in Xenopus oocyte maturation is very useful for future studies to understand molecular networks in oocyte maturation.

      Weaknesses:

      Although the authors offered molecular pathways of the phosphorylation in the translation, protein degradation, cell cycle regulation, and chromosome segregation. The author did not check the validity of the molecular pathways based on their proteomic data by the experimentation.

      We thank the reviewer for his/her positive comments. The public review points out that "The author did not check the validity of the molecular pathways based on their proteomic data by the experimentation." This is entirely accurate. We focused our work on obtaining the highest quality proteomic and phosphoproteomic data possible, and then sought to highlight these data by connecting them with existing functional data from the literature. This approach has opened up research avenues with enormous, previously unforeseen potential, in a wide range of biological fields (cell cycle, meiosis, oogenesis, embryonic development, cell biology, cellular physiology, signaling, evolution, etc.). We chose not to delay publication by experimentally investigating the very narrow area in which we are specialists (meiotic maturation), while our data offer a vast array of research opportunities across various fields. Our goal was, therefore, to present this extensive dataset as a resource for different scientific communities, who can explore their specific biological questions using our data. This is why we submitted our article to the "Repository" section of eLife. Nevertheless, in the context of the comparative analysis of the mouse and Xenopus phosphoproteomes performed at the reviewer’s request, we felt it was important to complement this new section with functional experiments that not only validate the proteomic data but also provide new insights into certain proteins and their regulation by Cdk1 (new paragraph lines 824-860 and new Figure 9).

      We have also followed all of the reviewer's recommendations and thank him/her, as the suggestions have significantly enhanced the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Fig. 1 -> In the Figure legend "mPRβ" is called "mPRb". In the Figure, it is indicated that PKA substrates are always activated by the phosphorylation. As the relevant substrates and the mode-of-action of the Arpp19 phosphorylation are not clear at the moment, this seems to be preliminary. It could for example also be conceivable that PKA phosphorylation inhibits a translation activator. In addition, the PG-dependent translation of RINGO/Speedy should be included in the model.

      We fully agree with the reviewer. PKA substrates can either be activators of the Cdk1 activation pathway, which are inhibited by phosphorylation by PKA, or repressors of the same pathway, which are activated by phosphorylation by PKA. This is now illustrated in the new Fig. 1. In addition, we have also included RINGO/Speedy in the model and in the text (lines 78-79) and corrected "mPRb" in the legend.

      (2) Lane 51-52 -> it is questionable if the meiotic divisions can be called "embryonic processes"

      We agree with the reviewer comment, and we have removed the word “embryonic”.

      (3) Lane 53 and lane 106-107 -> recent data have indicated that transcription already starts during cell cycle 12 and 13 in most cells (e.g. Blitz and Cho: Control of zygotic genome activation in Xenopus (2021))

      We apologize for this mistake. The text has been corrected and the reference added (lines 53 and 107).

      (4) Lane 61-62 -> "MI" and "MII" are given as abbreviation for "first and second meiotic spindle"

      The text has been clarified to explain that MI is referred to metaphase I and MII stands for metaphase II (lines 61-64).

      (%) Lane 131-132 -> "single-cell" is mentioned redundantly in this sentence.

      The sentence has been corrected (lines 131-132).

      (6) Fig. 2B -> it is not explained what is plotted as "Average levels" on the x-Axis. Is it the average of expression over all samples or at a given time point? Are the values given as a concentration or are the values normalized? If so, how were they normalized?

      We agree with the reviewer comment that “Average levels” may have been unclear. In the new Fig. 2B, we have re-plotted the graph using the average protein concentration during meiosis, measured as described in the Methods section.

      (7) In Fig. 2-supplement 3E -> from the descriptions it is not entirely clear to me what the difference to the data in Fig. 2B is?

      We thank the reviewer for his/her question regarding the relationship between the data in Fig. 2B and Fig. 2-supplement 3E. We confirm that the raw data visualized in Fig. 2-supplement 3E are the same as those in Fig. 2B. However, in Fig. 2-supplement 3E, the data are color-coded differently to highlight the number of proteins whose concentrations change during meiotic divisions, based on the threshold adopted. The legend of Fig. 2-supplement 3E has been modified to clarify this point.

      (8) Lane 225-226 -> Kifc1 is a minus-end directed motor

      This mistake has been corrected (lines 232-233).

      (9) Lane 271 -> Serbp1, here mentioned to be involved in stabilization of mRNAs, has also been implicated in the regulation of ribosomes (e.g. Leesch et al. 2023). Regarding the overall topic of this manuscript, this could be mentioned as well.

      We agree with the referee that the important role of Serbp1 in the control of ribosome hibernation needs to be mentioned. We have included this point in the revised manuscript together with the reference (lines 277-279).

      (10) Lane 360-363 -> it is mentioned that APPL1 and Akt2 act "to induce meiosis". Furthermore, in the Nader et al. 2020 paper, Akt2 phosphorylation is reported to happen within 30min after PG treatment. In the present work, they only seem to get phosphorylated when Cdk1 is activated. Is there an explanation for this discrepancy?

      Indeed, Nader et al. (2020) indicate that Akt2 is phosphorylated on Ser473 (actually, they should have mentioned Ser474, which is the phosphorylated residue on Akt2; Ser473 corresponds to the numbering of Akt1) between 5 and 30 minutes post-Pg, which supports their hypothesis of an early role for this kinase. However, these conclusions should be taken with caution, considering that their functional experiment using antisense against Akt2 depletes only 25% of the protein, the antibody used to visualize Akt2 phosphorylation also recognizes phosphorylated Akt1 and Akt3, and they did not analyze phosphorylation of the protein after 30 minutes. Therefore, we cannot determine whether the level observed at 30 minutes represents a maximum or if it is just the onset of the phosphorylation that peaks later, possibly after activation of Cdk1, for example.

      Regarding our measurements: we clearly observe phosphorylation of Akt2 following Cdk1 activation on Ser131. We did not detect Akt2 phosphorylation on Ser474, but since our measurements started 1 hour post-Pg, this protein may have returned to a dephosphorylated state on Ser474.

      Therefore, the observations of Nader et al. and ours involve different residues and different phosphorylation kinetics, Nader et al. limiting their analysis to the first 30 minutes, whereas we started at 1 hour.

      We have revised the manuscript text to make these aspects clearer (lines 387-392).

      (11) Fig. 3B -> it could be made clearer in the Figure that all these sites belong to class I

      A title “Class I proteins” has been added in Fig. 3B to clarify it.

      (12) Lane 433-434 -> the authors write that the proteomic data of this study confirm that PATL1 is accumulating during meiotic maturation. However, in Fig. 2B PATL1 is not among the significantly enriched proteins.

      We apologize for this error. Indeed, PATL1 protein is not significantly enriched. The text has been corrected (lines 461-465).

      (13) Fig. 4B -> Zar2 is color-coded to increase in abundance. This is clearly different to published results and what is shown in Fig. 2B of this manuscript.

      Indeed, our dataset shows that the quantity of Zar2 decreases. This does not appear anymore in Figure 2B since Zar2 average concentration cannot be estimated. We made an error in the color coding, which has now been corrected in Figure 4B.

      (14) Lane 442-444 -> it might be worth mentioning that the interaction between CPEB1 and Maskin, and thus probably its role in regulation of translation, could not be reproduced in other studies (Minshall et al.: CPEB interacts with an ovary-specific eIF4E and 4E-T in early Xenopus oocytes (2007) or Duran-Arque et al.: Comparative analyses of vertebrate CPEB proteins define two subfamilies with coordinated yet distinct functions in post-transcriptional gene regulation (2022)).

      This clarification is now mentioned in the text, supported by the two references that have been added (lines 471-477).

      (15) Lane 483-485 -> The meaning of these sentences is not entirely clear to me. What exactly is the similarity with the function of Emi1? What does "...binding of Cyclin B1..." mean (binding to which other protein?). What is the similarity between Emi1 and CPEB1/BTG4, both of which are regulators of mRNA stability/polyadenylation?

      We apologize if these sentences were unclear. Our intention was to emphasize the central role of ubiquitin ligases in regulating multiple events during meiotic divisions. We used SCF<sup>βTrCP</sup>, a wellstudied ubiquitin ligase in Xenopus and mouse oocytes during meiosis, as an example. SCF<sup>βTrCP</sup> regulates the degradation of several substrates, including Emi1, Emi2, CPEB1, and Btg4, whose degradation or stabilization is essential for the proper progression of meiosis. Lastly, we highlighted that these regulatory processes, mediated by protein degradation, may be conserved in mitosis, as for example the destruction of Emi1. We have rewritten this paragraph for clarity (lines 513-518).

      (16) Lane 521-522 and 572-573 -> the authors write that Myt1 was not detected in their proteome. However, in Fig. 6A they list "pkmyt1" as a class II protein. On Xenbase, "pkmyt1" is the Cdk1 kinase, "Myt1" is a transcription factor, so the authors might have been looking for the wrong protein.

      We thank the reviewer for this accurate observation. We have modified the text to correct this error (lines 554 and 607).

      (17) Lane 564-565 -> The authors state that Cdk1 activity can be measured by analyzing Cdc27 S428 phosphorylation. However, in vivo the net phosphorylation of a site is always depending on the relevant kinase and phosphatase activities. As S428 is a Cdk1 site, it is not unlikely that it is dephosphorylated by PP2A-B55, which by itself is under the control of Cdk1. Do the authors have direct evidence that the change in phosphorylation of S428 can only be attributed to the changes in Cdk1 activity?

      There is evidence in the literature that Cdc27 is dephosphorylated by PP2A (Torres et al., 2010). In Xenopus oocytes, PP2A activity is high during prophase (Lemonnier et al., 2021) and decreases at the time of Cdk1 activation, mediated by the Greatwall-ENSA/Arpp19 system, remaining low until MII (Labbé et al., 2021). Therefore, the period where fluctuations in Cdk1 activity are difficult to assess, from NEBD to MII, corresponds to a phase of inhibited PP2A activity. As a result, the phosphorylation level of Cdc27 reflects primarily the activity of Cdk1. We have added this clarification in the text (lines 597-600).

      (18) Fig. 7C and 7D -> in 7C, for Nup35/Nup53 there is a phospho-peptide GIMEVRS(60)PPLHSGG. In Fig. 7D phosphorylation of GVMEMRS(59)PLFSGG is analyzed. Is this the same phosphosite/region of Nup35/Nup53? How can there be a slightly different version of the same peptide in one protein? Are these the L- and S-version of Nup35/Nup53? It is also very surprising that the two phosphosites belong to different classes, class III and class II, respectively.

      We thank the reviewer for this observation. The peptides GIMEVRS(60)PPLHSGG and GVMEMRS(59)PLFSGG correspond to the same phosphorylation site in the L and S versions of Xenopus laevis Nup35, respectively. The L version peptide was classified as Class III, while the S version was not assigned to any class due to its high phosphorylation level in prophase, which prevented it from meeting the log<sub>2</sub> fold-change threshold of 1 required by our analysis to detect significant differences.

      (19) Table 1 -> second last column is headed "Whur, 2014"

      The typo has been corrected.

      (20) Fig. 8 -> Why are all the traces starting at t=1h after PG?

      The labeling of the graphs in Fig. 8 has been corrected, and the traces now begin at t0.

      (21) Lane 754 -> Although a minority, there are also some minus-end directed kinesins, e.g. Kifc1

      We agree with the reviewer. We should have mentioned that, in addition to dyneins, some kinesins are minus-end directed motors, especially since one of them, Kifc1, is regulated at the level of its accumulation. We have rephrased the relevant sentences to incorporate this observation (lines 790-793).

      (22) Section "Assembly of microtubule spindles and microtubule dynamics" -> Although this section clearly has a strong focus on phosphorylation, it might be worth mentioning again that many regulators of the microtubule spindle, e.g. TXP2, are among the upregulated proteins in Fig. 2B/C

      We have already discussed that the protein levels of certain key regulators of the mitotic spindle (Tpx2, PRC1, SSX2IP, Kif11/Eg5 among others) are subject to control during meiotic maturation in a previous chapter “Protein accumulation: the machinery of cell division and DNA replication” (lines 230-239). We agree with the reviewer that this important observation can be mentioned again at the beginning of this chapter on phosphorylation control. We have added a sentence regarding this at the start of the paragraph (lines 774-775).

      Reviewer #2 (Recommendations for the authors):

      While I find the manuscript excellent and detailed already in its current form, I would appreciate including even more comparisons to other systems. In particular, a similar phosphoproteomics experiment has been performed in starfish oocytes undergoing meiosis (Swartz et al, eLife, 2021), and there are several studies on mitosis of diverse mammalian cells. It would be very exciting to see to what extent changes are conserved.

      We thank the reviewer for this recommendation, which we have attempted to follow. We have matched our dataset of mass spectrometry using the the phosphor-occupancy_matlab package, available as part of our code repository (https://github.com/elizabeth-van-itallie) previously described in (Van Itallie et al, 2025). Unfortunately, we were unable to match our dataset with the data from Swartz et al. (2021) on starfish oocyte due to the low sequence conservation. However, we have compared our dataset with the dataset from Sun et al. (2024) on mouse oocyte maturation. We identified a total of 408 conserved phosphorylation sites, which mapped to 320 proteins in Xenopus and 277 in mice (refer to a new paragraph: lines 824-860, new Figure 9, Methods: lines 1011-1032 and 1060-1065, and Appendix 7). The phosphorylation patterns during meiosis showed a significant crossspecies correlation (Pearson r = 0.39, p < 0.0001; see new Figure 9A), demonstrating the evolutionary conservation of phosphoproteomic regulation. Important phosphorylation events, including Plk1 at T201, Gwl at S467, and Erk2 at T188, were upregulated in both species, in line with the activation of the Cdk1 and MAPK signaling cascades (Figure 6B, new Figure 9A-B). We validated several of these phosphorylation sites by western blotting and demonstrated their dependency on Cdk1 activation (new Figure 9C). Together, these findings reinforce the notion that fundamental phospho-regulatory pathways are conserved during oocyte maturation in vertebrates.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 6, the first paragraph of Results section: Please describe the method on how the authors measured and quantified the proteomes in different stages of Xenopus oocyte maturation briefly. Without the experimental design, it is very hard to evaluate the results in the following paragraphs.

      As requested by the reviewer, we added a few sentences describing the method of proteomics and phosphoproteomics measurements in oocytes resuming meiosis (lines 151-158).

      (2) In the phospho-proteome, it is better to classify the amino acids for the phosphorylation such as Ser, Thr, and Tyr. Particularly how many tyrosine phosphorylations are in the list.

      Our phosphosites dataset contains 80% Ser, 19.9% Thr, and 0.01% Tyr. Phospho-Tyr are slightly less abundant than what has been described in the literature (in most cells “roughly 85-90% of protein phosphorylation happens on Ser, ~10% on Thr, and less than 0.05% on Tyr" after Sharma et al., 2014. The same observation was made regarding the distribution of phosphorylated amino acids in mouse oocytes, where phospho-Tyr abundance is relatively diminished in oocytes compared to mouse organs (Sun et al., 2024). These observations are now reported in the manuscript (lines 309-313).

      (3) In class II (Figure 3), when Cdk1 (line 326) is a major kinase, how many phosphorylation sites are a target of Cdk1 (with the Cdk1-motif)? Moreover, do the authors find any other consensus sequences for the phosphorylation? Those are either known or unknown. This information would be useful for the readers.

      We thank the reviewer for this valuable comment. To address it, we used the kinase prediction server (https://kinase-library.phosphosite.org/kinase-library/score-site) to analyze Class II phosphosites. These new results are mentioned in lines 340-349 and illustrated in a new Figure (Figure 3—figure supplement 1A). We identified 303 sites predicted to be phosphorylated by Cdk1. Of these, 166 were also predicted as Erk1/2 targets, reflecting the similarity between Cdk1 and Erk1/2 consensus motifs.

      Cdk1 substrate phosphorylation is governed by more than just the presence of a consensus sequence. In addition to its preference for the (S/T)P×(K/R) motif, Cdk1/cyclin complexes achieve specificity through docking interactions with short linear motifs (SLiMs) recognized by the cyclin subunit (as LxF motifs)(Loog & Morgan, 2005), and via the Cdk-binding subunits Cks1 or Cks2, which interact with phosphorylated threonine residues in primed substrates (Örd et al, 2019). These mechanisms promote processive multisite phosphorylation and allow Cdk1 to target substrates even at non-canonical sites. Our motif-based analysis captures only part of this complexity and may underestimate the number of true Cdk1 targets.

      To further explore kinase involvement across phosphosite classes, we extended the analysis to all clusters and identified the most enriched kinase predictions for each (lines 360-365, new Figure 3— figure supplement 1B). In Class II, the most enriched kinases included Cdk1, Erk2, and Plk1, supporting the conclusions derived from the identification of the phosphosites of this Class. But others such as Cdk2, Cdk3, Cdk5, Cdk16, KIS, JNK1, and JNK3 were also identified.

      (4) Figure 3B: Why do the authors show this kind of Table only for Class I, not Classes II-V? It would be informative to show candidate proteins in other classes.

      We chose to present the candidate proteins from Class I in a table format because the number of phosphosites (136) was too small to allow a meaningful Gene Ontology (GO) enrichment analysis. Therefore, we manually curated the data and highlighted proteins whose Class I phosphosites are associated with specific biological processes. For Classes II–V, the higher number of phosphosites allowed us to perform GO enrichment analyses. Since several of the enriched processes were shared across different classes, and some proteins have phosphosites in multiple classes, we opted to organize the results by biological processes rather than by class. We agree with the reviewer that it is indeed valuable to highlight interesting proteins with Class II–V phosphosites. We have done so in Figures 4 through 8, using graphical representations instead of tables, in order to make the data more accessible and avoid long tables. Additionally, the Supplementary Figures provide detailed phosphorylation trends for many of the proteins discussed in the main figures.

      (5) It would be nice if the authors compare this phospho-proteome in Xenopus oocyte maturation with that in mouse oocyte maturation (Sun et al. 2024) in terms of evolutional conservation of the phospho-proteomes.

      We thank the reviewer for this suggestion. As now detailed in the manuscript, we compared our Xenopus phosphoproteome with the dataset from Sun et al. (2024) on mouse oocyte maturation using the the phospho_occupancy_matlab package, available as part of our code repository (https://github.com/elizabeth-van-itallie) previously described in (Van Itallie et al, 2025). We identified 408 conserved phosphorylation sites corresponding to 320 Xenopus and 277 mouse proteins (see new paragraph: lines 824-860, new Figure 9, Methods: lines 1011-1032 and 1060-1065, and Appendix 7). Phosphorylation dynamics across meiosis were significantly correlated between the species (Pearson r = 0.39, p < 0.0001; new Figure 9A), highlighting evolutionary conservation of the phosphoproteomes. Key phosphorylation events such as Plk1 at T201, Gwl at S467, and Erk2 at T188 increased in both species, consistent with activation of the Cdk1 and MAPK pathways (Figure 6B, new Figure 9A–B). We validated experimentally several of these phosphorylation sites by western blot (Erk2, Plk1, Fak1 and Akts1) and demonstrated their dependency on Cdk1 activation (new Figure 9C). Together, these new findings support the conservation of key phospho-regulatory mechanisms across vertebrate oocyte maturation.

      Minor points:

      (1) Reference lists: Please add Sun et al (2024) shown in line 115.

      This important reference has been added (lines 115, 134, 313 and 826).

      (2) Figure 1, red arrows for the inhibition: This should be "T" shape for a better understanding of these complicated pathways.

      We agree with the reviewer’s remark, and we have modified Figure 1.

      (3) Line 236-238: The authors referred to the absence of Cdc6 in oocyte maturation in Xenopus. However, Figure 2C shows that Cdc6 belongs to a list of accumulating proteins with Orc1 and Ocr2 etc. and the authors did not discuss this discrepancy in the text. Please clarity the claim.

      We apologize for the unclear wording in our text. The section of the manuscript regarding the pre-RC components may have been misleading. The text has been revised to clarify that Cdc6 was not detected in prophase-arrested oocytes by western blot and that it accumulates during meiotic maturation after MI, enabling oocytes to replicate DNA (lines 243-250).

      (4) Line 306: Please add the link to phosphosite.org.

      The link has been added (line 319).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use the theory of planned behavior to understand whether or not intentions to use sex as a biological variable (SABV), as well as attitude (value), subjective norm (social pressure), and behavioral control (ability to conduct behavior), across scientists at a pharmacological conference. They also used an intervention (workshop) to determine the value of this workshop in changing perceptions and misconceptions. Attempts to understand the knowledge gaps were made.

      Strengths:

      The use of SABV is limited in terms of researchers using sex in the analysis as a variable of interest in the models (and not a variable to control). To understand how we can improve on the number of researchers examining the data with sex in the analyses, it is vital we understand the pressure points that researchers consider in their work. The authors identify likely culprits in their analyses. The authors also test an intervention (workshop) to address the main bias or impediments for researchers' use of sex in their analyses.

      Weaknesses:

      There are a number of assumptions the authors make that could be revisited:

      (1) that all studies should contain across sex analyses or investigations. It is important to acknowledge that part of the impetus for SABV is to gain more scientific knowledge on females. This will require within sex analyses and dedicated research to uncover how unique characteristics for females can influence physiology and health outcomes. This will only be achieved with the use of female-only studies. The overemphasis on investigations of sex influences limits the work done for women's health, for example, as within-sex analyses are equally important.

      The Sex and Gender Equity in Research (SAGER) guidelines (1) provide guidance that “Where the subjects of research comprise organisms capable of differentiation by sex, the research should be designed and conducted in a way that can reveal sex-related differences in the results, even if these were not initially expected.”. This is a default position of inclusion where the sex can be determined and analysis assessing for sex related variability in response. This position underpins many of the funding bodies new policies on inclusion.

      However, we need to place this in the context of the driver of inclusion. The most common reason for including male and female samples is for those studies that are exploring the effect of a treatment and then the goal of inclusion is to assess the generalisability of the treatment effect (exploratory sex inclusion)(2). The second scenario is where sex is included because sex is one of the variables of interest and this situation will arise because there is a hypothesized sex difference of interest (confirmatory sex inclusion).

      We would argue that the SABV concept was introduced to address the systematic bias of only studying one sex when assessing treatment effect to improve the generalisability of the research. Therefore, it isn’t directly to gain more scientific knowledge on females. However, this strategy will highlight when the effect is very different between male and female subjects which will potentially generate sex specific hypotheses.

      Where research has a hypothesis that is specific to a sex (e.g. it is related to oestrogen levels) it would be appropriate to study only the sex of interest, in this case females. The recently published Sex Inclusive Research Framework gives some guidance here and allows an exemption for such a scenario classifying such proposals “Single sex study justified” (3).

      We plan to add an additional paragraph to the introduction to clarify the objectives behind inclusion and how this assists the research process.

      (2) It should be acknowledged that although the variability within each sex is not different on a number of characteristics (as indicated by meta-analyses in rats and mice), this was not done on all variables, and behavioral variables were not included. In addition, across-sex variability may very well be different, which, in turn, would result in statistical sex significance. In addition, on some measures, there are sex differences in variability, as human males have more variability in grey matter volume than females. PMID: 33044802.

      The manuscript was highlighting the common argument used to exclude the use of females, which is that females are inherently more variable as an absolute truth. We agree there might be situations, where the variance is higher in one sex or another depending on the biology. We will extend the discussion here to reflect this, and we will also link to the Sex Inclusive Research Framework (3) which highlights that in these situations researchers can utlise this argument provided it is supported with data for the biology of interest.

      (3) The authors need to acknowledge that it can be important that the sample size is increased when examining more than one sex. If the sample size is too low for biological research, it will not be possible to determine whether or not a difference exists. Using statistical modelling, researchers have found that depending on the effect size, the sample size does need to increase. It is important to bare this in mind as exploratory analyses with small sample size will be extremely limiting and may also discourage further study in this area (or indeed as seen the literature - an exploratory first study with the use of males and females with limited sample size, only to show there is no "significance" and to justify this as an reason to only use males for the further studies in the work.

      The reviewer raises a common problem: where researchers have frequently argued that if they find no sex differences in a pilot then they can proceed to study only one sex. The SAGER guidelines (1), and now funder guidelines (4, 5), challenge that position. Instead, the expectation is for inclusion as the default in all experiments (exploratory inclusion strategy) to allow generalisable results to be obtained. When the results are very different between the male and female samples, then this can be determined. This perspective shift (2) requires a change in mindset and understanding that the driver behind inclusion is of generalisability not exploration of sex differences. This will be added to the introduction as an additional paragraph exploring the drivers behind inclusion.

      We agree with the reviewer that if the researcher is interested in sex differences in an effect (confirmatory inclusion strategy, aka sex as a primary variable) then the N will need to be higher. However, in this situation, one, of course, must have male and female samples in the same experiment to allow the simultaneous exploration to assess the dependency on sex.

      Reviewer #2 (Public review):

      Summary:

      The investigators tested a workshop intervention to improve knowledge and decrease misconceptions about sex inclusive research. There were important findings that demonstrate the difficulty in changing opinions and knowledge about the importance of studying both males and females. While interventions can improve knowledge and decrease perceived barriers, the impact was small.

      Strengths:

      The investigators included control groups and replicated the study in a second population of scientists. The results appear to be well substantiated. These are valuable findings that have practical implications for fields where sex is included as a biological variable to improve rigor and reproducibility.

      Thank you for assessment and highlighting these strengths. We appreciate your recognition of the value and practical implications of this work.

      Weaknesses:

      I found the figures difficult to understand and would have appreciated more explanation of what is depicted, as well as greater space between the bars representing different categories.

      We plan to review the figures and figure legends to improve clarity of the data.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to determine cultural biases and misconceptions in inclusive sex research and evaluate the efficacy of interventions to improve knowledge and shift perceptions to decrease perceived barriers for including both sexes in basic research.

      Overall, this study demonstrates that despite the intention to include both sexes and a general belief in the importance of doing so, relatively few people routinely include both sexes. Further, the perceptions of barriers to doing so are high, including misconceptions surrounding sample size, disaggregation, and variability of females. There was also a substantial number of individuals without the statistical knowledge to appropriately analyze data in studies inclusive of sex. Interventions increased knowledge and decreased perception of barriers. Strengths:

      (1) This manuscript provides evidence for the efficacy of interventions for changing attitudes and perceptions of research.

      (2) This manuscript also provides a training manual for expanding this intervention to broader groups of researchers.

      Thank you for highlighting these strengths. We appreciate your recognition that the intervention was effect in changing attitudes and perception. We deliberately chose to share the material to provide the resources to allow a wider engagement.

      Weaknesses:

      The major weakness here is that the post-workshop assessment is a single time point, soon after the intervention. As this paper shows, intention for these individuals is already high, so does decreasing perception of barriers and increasing knowledge change behavior, and increase the number of studies that include both sexes? Similarly, does the intervention start to shift cultural factors? Do these contribute to a change in behavior?

      Measuring change in behaviour following an intervention is challenging and hence we had implemented an intention score as a proxy for behaviour. We appreciate the benefit of a long-term analysis, but it was beyond the scope of this study and would need a larger dataset size to allow for attrition. We agree that the strategy implemented has weaknesses. We plan to extend the limitation section in the discussion to include these.

      References

      (1) Heidari S, Babor TF, De Castro P, Tort S, Curno M. Sex and Gender Equity in Research: rationale for the SAGER guidelines and recommended use. Res Integr Peer Rev. 2016;1:2.

      (2) Karp NA. Navigating the paradigm shift of sex inclusive preclinical research and lessons learnt. Commun Biol. 2025;8(1):681.

      (3) Karp NA, Berdoy M, Gray K, Hunt L, Jennings M, Kerton A, et al. The Sex Inclusive Research Framework to address sex bias in preclinical research proposals. Nat Commun. 2025;16(1):3763.

      (4) MRC. Sex in experimental design - Guidance on new requirements https://www.ukri.org/councils/mrc/guidance-for-applicants/policies-and-guidance-for-researchers/sex-in-experimental-design/: UK Research and Innovation; 2022

      (5) Clayton JA, Collins FS. Policy: NIH to balance sex in cell and animal studies. Nature. 2014;509(7500):282-3.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling study identifying RBMX2 as a novel host factor upregulated during Mycobacterium bovis infection.

      The study demonstrates that RBMX2 plays a role in:

      (1) Facilitating M. bovis adhesion, invasion, and survival in epithelial cells.

      (2) Disrupting tight junctions and promoting EMT.

      (3) Contributing to inflammatory responses and possibly predisposing infected tissue to lung cancer development.

      By using a combination of CRISPR-Cas9 library screening, multi-omics, coculture models, and bioinformatics, the authors establish a detailed mechanistic link between M. bovis infection and cancer-related EMT through the p65/MMP-9 signaling axis. Identification of RBMX2 as a bridge between TB infection and EMT is novel.

      Strengths:

      This topic and data are both novel and significant, expanding the understanding of transcriptomic diversity beyond RBM2 in M. bovis responsive functions.

      Weaknesses:

      (1) The abstract and introduction sometimes suggest RBMX2 has protective anti-TB functions, yet results show it facilitates pathogen adhesion and survival. The authors need to rephrase claims to avoid contradiction.

      We sincerely appreciate the reviewer's valuable feedback regarding the need to clarify RBMX2's role throughout the manuscript. We have carefully revised the text to ensure consistent messaging about RBMX2's function in promoting M. bovis infection. Below we detail the specific modifications made:

      (1) Introduction Revisions:

      Changed "The objective of this study was to elucidate the correlation between host genes and the susceptibility of M.bovis infection" to "The objective of this study was to identify host factors that promote susceptibility to M.bovis infection"

      Revised "RBMX2 polyclonal and monoclonal cell lines exhibited favorable phenotypes" to "RBMX2 knockout cell lines showed reduced bacterial survival"

      Replaced "The immune regulatory mechanism of RBMX2" with "The role of RBMX2 in facilitating M.bovis immune evasion"

      (2) Results Revisions:

      Modified "RBMX2 fails to affect cell morphology and the ability to proliferate and promotes M.bovis infection" to "RBMX2 does not alter cell viability but significantly enhances M.bovis infection"

      Strengthened conclusion in Figure 4: "RBMX2 actively disrupts tight junctions to facilitate bacterial invasion"

      (3) Discussion Revisions:

      Revised screening description: "We screened host factors affecting M.bovis susceptibility and identified RBMX2 as a key promoter of infection"

      Strengthened concluding statement: "In summary, RBMX2 drives TB pathogenesis by compromising epithelial barriers and inducing EMT"

      These targeted revisions ensure that:

      All sections consistently present RBMX2 as promoting infection; the language aligns with our experimental finding; potential protective interpretations have been eliminated. We believe these modifications have successfully addressed the reviewer's concern while maintaining the manuscript's original structure and scientific content. We appreciate the opportunity to improve our manuscript and thank the reviewer for this constructive suggestion.

      (2) >While p65/MMP-9 is convincingly implicated, the role of MAPK/p38 and JNK is less clearly resolved.

      We sincerely appreciate the reviewer's insightful comment regarding the roles of MAPK/p38 and JNK in our study. Our experimental data clearly demonstrated that RBMX2 knockout significantly reduced phosphorylation levels of p65, p38, and JNK (Fig. 5A), indicating potential involvement of all three pathways in RBMX2-mediated regulation.

      Through systematic functional validation, we obtained several important findings:

      In pathway inhibition experiments, p65 activation (PMA treatment) showed the most dramatic effects on both tight junction disruption (ZO-1, OCLN reduction) and EMT marker regulation (E-cadherin downregulation, N-cadherin upregulation);

      p38 activation (ML141 treatment) exhibited moderate effects on these processes;

      JNK activation (Anisomycin treatment) displayed minimal impact.

      Most conclusively, siRNA-mediated silencing of p65 alone was sufficient to:

      Restore epithelial barrier function

      Reverse EMT marker expression

      Reduce bacterial adhesion and invasion

      These results establish a clear hierarchy in pathway importance: p65 serves as the primary mediator of RBMX2's effects, while p38 plays a secondary role and JNK appears non-essential under our experimental conditions. We have now clarified this relationship in the revised Discussion section to strengthen this conclusion.

      This refined understanding of pathway hierarchy provides important mechanistic insights while maintaining consistency with all our experimental data. We thank the reviewer for this valuable suggestion that helped improve our manuscript.

      (3) Metabolomics results are interesting but not integrated deeply into the main EMT narrative.

      Thank you for this constructive suggestion. In this article, we detected the metabolome of RBMX2 knockout and wild-type cells after Mycobacterium bovis infection, which mainly served as supporting evidence for our EMT model. However, we did not conduct an in-depth discussion of these findings. We have now added a detailed discussion of this section to further support our EMT model.

      ADD:Meanwhile, metabolic pathways enriched after RBMX2 deletion, such as nucleotide metabolism, nucleotide sugar synthesis, and pentose interconversion, primarily support cell proliferation and migration during EMT by providing energy precursors, regulating glycosylation modifications, and maintaining redox balance; cofactor synthesis and amino sugar metabolism participate in EMT regulation through influencing metabolic remodeling and extracellular matrix interactions; chemokine and cGMP-PKG signaling pathways may further mediate inflammatory responses and cytoskeletal rearrangements, collectively promoting the EMT process.

      (4) A key finding and starting point of this study is the upregulation of RBMX2 upon M. bovis infection. However, the authors have only assessed RBMX2 expression at the mRNA level following infection with M. bovis and BCG. To strengthen this conclusion, it is essential to validate RBMX2 expression at the protein level through techniques such as Western blotting or immunofluorescence. This would significantly enhance the credibility and impact of the study's foundational observation.

      Thank you for your comment. We have supplemented the experiments in this part and found that Mycobacterium bovis infection can significantly enhance the expression level of RBMX2 protein.

      (5) The manuscript would benefit from a more in-depth discussion of the relationship between tuberculosis (TB) and lung cancer. While the study provides experimental evidence suggesting a link via EMT induction, integrating current literature on the epidemiological and mechanistic connections between chronic TB infection and lung tumorigenesis would provide important context and reinforce the translational relevance of the findings.

      We sincerely appreciate the valuable comments from the reviewer. We fully agree with your suggestion to further explore the relationship between tuberculosis (TB) and lung cancer. In the revised manuscript, we will add a new paragraph in the Discussion section to systematically integrate the current literature on the epidemiological and mechanistic links between chronic tuberculosis infection and lung cancer development, including the potential bridging roles of chronic inflammation, tissue damage repair, immune microenvironment remodeling, and the epithelial-mesenchymal transition (EMT) pathway. This addition will help more comprehensively interpret the clinical implications of the observed EMT activation in the context of our study, thereby enhancing the biological plausibility and clinical translational value of our findings.

      ADD:There is growing epidemiological evidence suggesting that chronic TB infection represents a potential risk factor for the development of lung cancer. Studies have shown that individuals with a history of TB exhibit a significantly increased risk of lung cancer, particularly in areas of the lung with pre-existing fibrotic scars, indicating that chronic inflammation, tissue repair, and immune microenvironment remodeling may collectively contribute to malignant transformation 74. Moreover, EMT not only endows epithelial cells with mesenchymal features that enhance migratory and invasive capacity but is also associated with the acquisition of cancer stem cell-like properties and therapeutic resistance 75. Therefore, EMT may serve as a crucial molecular link connecting chronic TB infection with the malignant transformation of lung epithelial cells, warranting further investigation in the intersection of infection and tumorigenesis.

      Reviewer #2 (Public review):

      Summary:

      I am not familiar with cancer biology, so my review mainly focuses on the infection part of the manuscript. Wang et al identified an RNA-binding protein RBMX2 that links the Mycobacterium bovis infection to the epithelial-Mesenchymal transition and lung cancer progression. Upon mycobacterium infection, the expression of RBMX2 was moderately increased in multiple bovine and human cell lines, as well as bovine lung and liver tissues. Using global approaches, including RNA-seq and proteomics, the authors identified differential gene expression caused by the RBMX2 knockout during M. bovis infection. Knockout of RBMX2 led to significant upregulations of tight-junction related genes such as CLDN-5, OCLN, ZO-1, whereas M. bovis infection affects the integrity of epithelial cell tight junctions and inflammatory responses. This study establishes that RBMX2 is an important host factor that modulates the infection process of M. bovis.

      Strengths:

      (1) This study tested multiple types of bovine and human cells, including macrophages, epithelial cells, and clinical tissues at multiple timepoints, and firmly confirmed the induced expression of RBMX2 upon M. bovis infection.

      (2) The authors have generated the monoclonal RBMX2 knockout cell lines and comprehensively characterized the RBMX2-dependent gene expression changes using a combination of global omics approaches. The study has validated the impact of RBMX2 knockout on the tight-junction pathway and on the M. bovis infection, establishing RBMX2 as a crucial host factor.

      Weaknesses:

      (1) The RBMX2 was only moderately induced (less than 2-fold) upon M. bovis infection, arguing its contribution may be small. Its value as a therapeutic target is not justified. How RBMX2 was activated by M. bovis infection was unclear.

      Thank you for your valuable and constructive comments. In this study, we primarily utilized the CRISPR whole-genome screening approach to identify key factors involved in bovine tuberculosis infection. Through four rounds of screening using a whole-genome knockout cell line of bovine lung epithelial cells infected with Mycobacterium bovis, we identified RBMX2 as a critical factor.

      Although the transcriptional level change of RBMX2 was less than two-fold, following the suggestion of Reviewer 1, we examined its expression at the protein level, where the change was more pronounced, and we have added these results to the manuscript.

      Regarding the mechanism by which RBMX2 is activated upon M. bovis infection, we previously screened for interacting proteins using a Mycobacterium tuberculosis secreted and membrane protein library, but unfortunately, we did not identify any direct interacting proteins from M. tuberculosis (https://doi.org/10.1093/nar/gkx1173).

      (2) Although multiple time points have been included in the study, most analyses lack temporal resolution. It is difficult to appreciate the impact/consequence of M. bovis infection on the analyzed pathways and processes.

      We appreciate the valuable comments from the reviewers. Although our study included multiple time points post-infection, in our experimental design we focused on different biological processes and phenotypes at distinct time points:

      During the early phase (e.g., 2 hours post-infection), we focused on barrier phenotypes; during the intermediate phase (e.g., 24 hours post-infection), we concentrated more on pathway activation and EMT phenotypes;

      And during the later phase (e.g., 48–72 hours post-infection), we focused more on cell death phenotypes, which were validated in another FII article (https://doi.org/10.3389/fimmu.2024.1431207).

      We also examined the impact of varying infection durations on RBMX2 knockout EBL cellular lines via GO analysis. At 0 hpi, genes were primarily related to the pathways of cell junctions, extracellular regions, and cell junction organization. At 24 hpi, genes were mainly associated with pathways of the basement membrane, cell adhesion, integrin binding and cell migration By 48 hpi, genes were annotated into epithelial cell differentiation and were negatively regulated during epithelial cell proliferation. This indicated that RBMX2 can regulate cellular connectivity throughout the stages of M. bovis infection.

      For KEGG analysis, genes linked to the MAPK signaling pathway, chemical carcinogen-DNA adducts, and chemical carcinogen-receptor activation were observed at 0 hpi. At 24 hpi, significant enrichment was found in the ECM-receptor interaction, PI3K-Akt signaling pathway, and focal adhesion. Upon enrichment analysis at 48 hpi, significant enrichment was noted in the TGF-beta signaling pathway, transcriptional misregulation in cancer, microRNAs in cancer, small cell lung cancer, and p53 signaling pathway.

      Reviewer #3 (Public review):

      Summary:

      This study investigates the role of the host protein RBMX2 in regulating the response to Mycobacterium bovis infection and its connection to epithelial-mesenchymal transition (EMT), a key pathway in cancer progression. Using bovine and human cell models, the authors have wisely shown that RBMX2 expression is upregulated following M. bovis infection and promotes bacterial adhesion, invasion, and survival by disrupting epithelial tight junctions via the p65/MMP-9 signaling pathway. They also demonstrate that RBMX2 facilitates EMT and is overexpressed in human lung cancers, suggesting a potential link between chronic infection and tumor progression. The study highlights RBMX2 as a novel host factor that could serve as a therapeutic target for both TB pathogenesis and infection-related cancer risk.

      Strengths:

      The major strengths lie in its multi-omics integration (transcriptomics, proteomics, metabolomics) to map RBMX2's impact on host pathways, combined with rigorous functional assays (knockout/knockdown, adhesion/invasion, barrier tests) that establish causality through the p65/MMP-9 axis. Validation across bovine and human cell models and in clinical tissue samples enhances translational relevance. Finally, identifying RBMX2 as a novel regulator linking mycobacterial infection to EMT and cancer progression opens exciting therapeutic avenues.

      Weaknesses:

      Although it's a solid study, there are a few weaknesses noted below.

      (1) In the transcriptomics analysis, the authors performed (GO/KEGG) to explore biological functions. Did they perform the search locally or globally? If the search was performed with a global reference, then I would recommend doing a local search. That would give more relevant results. What is the logic behind highlighting some of the enriched pathways (in red), and how are they relevant to the current study?

      We appreciate the reviewer's thoughtful questions regarding our transcriptomic analysis. In this study, we employed a localized enrichment approach focusing specifically on gene expression profiles from our bovine lung epithelial cell system. This cell-type-specific analysis provides more biologically relevant results than global database searches alone.

      Regarding the highlighted pathways, these represent:

      (1) Temporally significant pathways showing strongest enrichment at each stage:

      • 0h: Cell junction organization (immediate barrier response)

      • 24h: ECM-receptor interaction (early EMT initiation)

      • 48h: TGF-β signaling (chronic remodeling)

      (2) Mechanistically linked to our core findings about RBMX2's role in:

      • Epithelial barrier disruption

      • Mesenchymal transition

      • Chronic infection outcomes

      We selected these particular pathways because they:

      (1) Showed the most statistically significant changes (FDR <0.001)

      (2) Formed a coherent biological narrative across infection stages

      (3) Were independently validated in our functional assays

      This targeted approach allows us to focus on the most infection-relevant pathways while maintaining statistical rigor.

      (2) While the authors show that RBMX2 expression correlates with EMT-related gene expression and barrier dysfunction, the evidence for direct association remains limited in this study. How does RBMX2 activate p65? Does it bind directly to p65 or modulate any upstream kinases? Could ChIP-seq or CLIP-seq provide further evidence for direct RNA or DNA targets of RBMX2 that drive EMT or NF-κB signaling?

      We sincerely appreciate the reviewer's in-depth questions regarding the mechanisms by which RBMX2 activates p65 and its association with EMT. Although the molecular mechanism remains to be fully elucidated, our study has provided experimental evidence supporting a direct regulatory relationship between RBMX2 and the p65 subunit of the NF-κB pathway. Specifically, we investigated whether the transcription factor p65 could directly bind to the promoter region of RBMX2 using CHIP experiments. The results demonstrated that the transcription factor p65 can physically bind to the RBMX2 region.

      Furthermore, dual-luciferase reporter assays were conducted, showing that p65 significantly enhances the transcriptional activity of the RBMX2 promoter, indicating a direct regulatory effect of RBMX2 on p65 expression.

      These findings support our hypothesis that RBMX2 activates the NF-κB signaling pathway through direct interaction with the p65 protein, thereby participating in the regulation of EMT progression and barrier function.

      In our subsequent work papers, we will also employ experiments such as CLIP to further investigate the specific mechanisms through which RBMX2 exerts its regulatory functions.

      (3) The manuscript suggests that RBMX2 enhances adhesion/invasion of several bacterial species (e.g., E. coli, Salmonella), not just M. bovis. This raises questions about the specificity of RBMX2's role in Mycobacterium-specific pathogenesis. Is RBMX2 a general epithelial barrier regulator or does it exhibit preferential effects in mycobacterial infection contexts? How does this generality affect its potential as a TB-specific therapeutic target?

      Thank you for your valuable comments. When we initially designed this experiment, we were interested in whether the RBMX2 knockout cell line could confer effective resistance not only against Mycobacterium bovis but also against Gram-negative and Gram-positive bacteria. Surprisingly, we indeed observed resistance to the invasion of these pathogens, albeit weaker compared to that against Mycobacterium bovis.

      Nevertheless, we believe these findings merit publication in eLife. Moreover, RBMX2 knockout does not affect the phenotype of epithelial barrier disruption under normal conditions; its significant regulatory effect on barrier function is only evident upon infection with Mycobacterium bovis.

      Importantly, during our genome-wide knockout library screening, RBMX2 was not identified in the screening models for Salmonella or Escherichia coli, but was consistently detected across multiple rounds of screening in the Mycobacterium bovis model.

      (4) The quality of the figures is very poor. High-resolution images should be provided.

      Thank you for your feedback; we provided higher-resolution images.

      (5) The methods are not very descriptive, particularly the omics section.

      Thank you for your comments; we have revised the description of the sequencing section.

      (6) The manuscript is too dense, with extensive multi-omics data (transcriptomics, proteomics, metabolomics) but relatively little mechanistic integration. The authors should have focused on the key mechanistic pathways in the figures. Improving the narratives in the Results and Discussion section could help readers follow the logic of the experimental design and conclusions.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      Thank you for your feedback. We understand your concern regarding the central claim that tolerance facilitates the evolution of resistance, while the paper can stand on its own without this claim, we think it provides an important layer to the interpretation of our findings. Considering your comments, we plan to revise the title and adjust to “Heat Stress Induces Phage Tolerance in Bacteria”.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.)

      Thank you for your helpful comment. Figure 2d presents colony counts from a plating assay following the phage killing experiment in Figure 2c. Bacteria collected after 0 and 2 hours of phage exposure were plated on both phage-free (−phage) and phage-containing (+phage) plates. The “−phage” condition reflects total survivors, while the “+phage” condition indicates the resistant subset.

      As seen in Figure 2d (left part), heat-treated bacteria showed markedly higher survival on phage-free plates than untreated cells, which were largely eliminated by phage. However, resistant colony counts on phage-containing plates were similar between two groups (as shown in figure 2d right part), suggesting that heat stress increased survival but did not promote resistance.

      To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates. This adjustment should eliminate any confusion and better reflect the experimental design.

      Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper and make it about heat tolerance rather than the evolution of heat resistance.

      Thank you for your thoughtful comment. We appreciate the opportunity to clarify the interpretation of Figure 2f and the rationale behind the experimental design. We agree that turbidity alone cannot fully distinguish resistance from persistence. However, our earlier experiments (Figures 2d and 2e) demonstrated that heat-treated survivors remained largely susceptible to phage, indicating that heat stress does not directly induce resistance. This led us to hypothesize that heat enhances phage tolerance, which in turn increases the likelihood of resistance emergence during subsequent infection.

      To test this, we used a low initial bacterial population (~10³ CFU per well) to minimize the chance of pre-existing resistance. Bacteria were exposed to phages at MOIs of 1, 10, and 100 and incubated for 24 hours in 100 µL volumes. This setup ensured:

      (1) The low initial population minimizes the presence of pre-existing resistant mutants, ensuring that any phage-resistant bacteria observed arise during the infection process.

      (2) The high MOI (≥ 1) ensures that each bacterial cell has a high probability of infection by at least one phage.

      (3) The small volume (100 µL per well) maximizes the interaction between bacteria and phages, ensuring rapid infection of susceptible bacteria, which leads to clear wells. If resistant mutants arise, they will grow and cause turbidity.

      Thus, the turbidity observed in heat-treated samples reflects de novo emergence and outgrowth of resistant mutants from a tolerant population. This assay supports the idea that heat-induced tolerance increases the probability of resistance evolution, rather than directly causing resistance.

      We have revised the text to better explain this experimental logic and adjust the framing of our conclusions accordingly.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      Thank you for your valuable comment. To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

      Thank you for your thoughtful comment. In our heat treatment evolutionary experiment, we isolated six distinct bacterial clones, of which two are highlighted in the manuscript as representative examples. One clone, BC2G11C1, acquired both heat tolerance and phage resistance, while another clone, BC3G11C2, became heat-tolerant but did not develop resistance to phage infection. This variation highlights the inherent diversity in evolutionary responses when exposed to selective pressures. It demonstrates that not all evolutionary pathways lead to the same outcome, even under similar stress conditions. This variability is a key observation in our study, illustrating that different genetic adaptations may arise depending on the specific mutations or genetic context, and not every strain will evolve phage resistance in parallel with heat tolerance. We have updated the manuscript to better reflect this diversity in the evolutionary trajectories observed.

      Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

      Thank you for your insightful comments. I appreciate your suggestion regarding the inclusion of relevant previous works. I have now incorporated additional citations to discuss these points, including studies on the relationship between heat stress and antibiotic resistance, as well as the tradeoffs between phage resistance and other stress factors.

      Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      Thank you for your insightful comments and for bringing up the role of PspA in quorum sensing and biofilm formation. However, we would like to clarify a potential misunderstanding: the PspA discussed in our manuscript refers to phage-shock protein A, a key regulator in the bacterial envelope stress response system. This is distinct from the pneumococcal surface protein A, which has been associated with quorum sensing and biofilm formation in Streptococcus pneumoniae (as referenced in your comment).

      To avoid any confusion for readers, we will ensure that our manuscript explicitly states “phage-shock protein A (PspA)” at its first mention. We appreciate your feedback and hope this clarification addresses your concern.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

      Thank you for your helpful suggestion. We have now included a figure, as appropriate summarizing the proteins of the lytic phage Kp11 (GenBank: ON148528.1) in supplementary Figure S1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Issues unrelated to those discussed in the public review

      (1) Figure 4a and its caption describe an evolution experiment, but they do not mention how many cycles of high-temperature treatment and growth this experiment lasted. I assume it lasted for more than one cycle, because the methods section mentions "cycles", but the number is not provided.

      Thank you for pointing this out. The evolutionary experiment shown in Figure 5a involved 11 cycles of high-temperature treatment and growth. We have now explicitly stated this in the figure legend to ensure clarity: BC: Batch culture, G: Evolution cycle number, C: Colony. BC2G11C1 refers to the first colony from batvh culture 2 after 11 rounds of heat treatment.

      (2) It is not clear what Figure 5F is supposed to show. What are the gray boxes? The caption claims that the figure shows non-synonymous mutations, but the only information it contains is about genes that seem to be affected by mutation. Judging from the mismatch between the main text and the figure, the mutants with these mutations may actually be mislabeled.

      Thank you for your careful review. Figure 5f highlights the non-synonymous mutations identified in the evolved strains. The gray boxes represent the ancestral strain’s whole genome without mutations, serving as a control. The corresponding labels indicate the specific mutations found in each evolved strain. We have clarified this in the figure caption to improve clarity. Additionally, we have carefully reviewed the labeling to ensure accuracy and consistency between the figure, main text, and sequencing data.

      (3) I think that the acronym NC, which is used in just about every figure, is explained nowhere in the paper. Spell out all acronyms at first use.

      Thank you for pointing this out. We have rivewed ensure that NC is clearly defined at its first mention in the text and figure legends to improve clarity. Additionally, we have reviewed the manuscript to ensure that all acronyms are properly introduced when first used.

      (4) The same holds for the acronym N.D. This is an especially important oversight because N.D. could mean "not determined" or "not detectable", which would lead to very different interpretations of the same figure.

      Thank you for your careful review. We have clarified the meaning of N.D., which stands for non-detectable, at its first use to avoid ambiguity and ensure accurate interpretation in the figure legend. Additionally, we have reviewed the manuscript to ensure that all acronyms are clearly defined.

      (5) The panel labels (a,b, etc.) in all figure captions are very difficult to distinguish from the rest of the text, and should be better highlighted, for example by using a bold font. However, this is a matter of journal style and will probably be fixed during typesetting.

      Thank you for your suggestion. We have adjusted the figure captions to better distinguish panel labels, such as using bold font, to improve readability and final formatting will follow the journal’s style during typesetting.

      (6) Line 224: enhanced insusceptibility -> reduced susceptibility.

      Thank you for your suggestion. We have revised “enhanced insusceptibility” to “reduced susceptibility” for clarity and precision.

      (7) Line 259: mice -> mouse.

      Thank you for catching this. We have corrected “mice” to “mouse”.

      Reviewer #2 (Recommendations for the authors):

      I have no concerns about the experimental design and conclusions of your work; however, I strongly recommend incorporating several relevant pieces of the literature related to your work, in the discussion of your manuscript, specifically:

      (1) Previous studies about the role of heat stress in phage infections, see:

      Greenrod STE, Cazares D, Johnson S, Hector TE, Stevens EJ, MacLean RC, King KC. Warming alters life-history traits and competition in a phage community. Appl Environ Microbiol. 2024 May 21;90(5):e0028624. doi: 10.1128/aem.00286-24. Epub 2024 Apr 16. PMID: 38624196; PMCID: PMC11107170.

      Thank you for your thoughtful comment. We have ensured to incorporate the study by Greenrod et al. (2024) into the discussion to enrich the context of our findings. As this article pointed out, a temperature of 42°C can indeed limit phage infection in bacteria, acting as a barrier from the phage’s perspective. Our study builds on this by demonstrating that bacteria pre-treated with high temperatures exhibit tolerance to phage infection. These findings, together with the work you referenced, underscore the importance of heat stress or elevated temperature in host-phage interactions, with 42°C being particularly relevant in the context of fever. We will make sure to clarify this connection in our revised manuscript.

      (2) The effect of heat stress and the tolerance/resistance against other antibiotics besides polymyxins, see:

      Lv B, Huang X, Lijia C, Ma Y, Bian M, Li Z, Duan J, Zhou F, Yang B, Qie X, Song Y, Wood TK, Fu X. Heat shock potentiates aminoglycosides against gram-negative bacteria by enhancing antibiotic uptake, protein aggregation, and ROS. Proc Natl Acad Sci U S A. 2023 Mar 21;120(12):e2217254120. doi: 10.1073/pnas.2217254120. Epub 2023 Mar 14. PMID: 36917671; PMCID: PMC10041086.

      Thank you for bringing this study to our attention. We have incorporated the findings from Lv et al. (2023) into the discussion of our manuscript, highlighting how sublethal temperatures may facilitate the killing of bacteria by antibiotics like kanamycin. This is consistent with our data showing enhanced susceptibility of heat-shocked bacteria to kanamycin. The study also provides insights into the potential role of PMF, which is relevant to our work on PspA, and strengthens the broader context of heat stress influencing both antibiotic resistance and tolerance.

      (3) Perhaps the most relevant overlooked fact was that recently it was demonstrated for E. coli, Klebsiella and Pseudomonas that pretreatment with lytic phages induced antibiotic persistence! Please discuss this finding and its implications for your work, see:

      Fernández-García L, Kirigo J, Huelgas-Méndez D, Benedik MJ, Tomás M, García-Contreras R, Wood TK. Phages produce persisters. Microb Biotechnol. 2024 Aug;17(8):e14543. doi: 10.1111/1751-7915.14543. PMID: 39096350; PMCID: PMC11297538.

      Sanchez-Torres V, Kirigo J, Wood TK. Implications of lytic phage infections inducing persistence. Curr Opin Microbiol. 2024 Jun;79:102482. doi: 10.1016/j.mib.2024.102482. Epub 2024 May 6. PMID: 38714140.

      Thank you for suggesting this important reference. We agree that the phenomenon of phage-induced bacterial persistence is highly relevant to our study. While our manuscript focuses on the role of heat stress in bacterial tolerance and resistance, we acknowledge that bacterial persistence against phages is an established concept. We have incorporated this finding into our discussion, emphasizing how persistence and tolerance can overlap in their effects on bacterial survival, especially under stress conditions like heat treatment. This will provide a more comprehensive understanding of how phage interactions with bacteria can lead to both persistence and resistance.

      (4) Finally, you observed a tradeoff pf the pspA* mutant increased phage/heat/polymyxin resistance and decreased immune evasion (perhaps by being unable to counteract phagocytosis), those tradeoffs between gaining phage resistance but losing resistance to the immune system, virulence impairment and resistance against some antibiotics had been extensively documented, see:

      Majkowska-Skrobek G, Markwitz P, Sosnowska E, Lood C, Lavigne R, Drulis-Kawa Z. The evolutionary trade-offs in phage-resistant Klebsiella pneumoniae entail cross-phage sensitization and loss of multidrug resistance. Environ Microbiol. 2021 Dec;23(12):7723-7740. doi: 10.1111/1462-2920.15476. Epub 2021 Mar 27. PMID: 33754440.

      Gordillo Altamirano F, Forsyth JH, Patwa R, Kostoulias X, Trim M, Subedi D, Archer SK, Morris FC, Oliveira C, Kielty L, Korneev D, O'Bryan MK, Lithgow TJ, Peleg AY, Barr JJ. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat Microbiol. 2021 Feb;6(2):157-161. doi: 10.1038/s41564-020-00830-7. Epub 2021 Jan 11. PMID: 33432151.

      García-Cruz JC, Rebollar-Juarez X, Limones-Martinez A, Santos-Lopez CS, Toya S, Maeda T, Ceapă CD, Blasco L, Tomás M, Díaz-Velásquez CE, Vaca-Paniagua F, Díaz-Guerrero M, Cazares D, Cazares A, Hernández-Durán M, López-Jácome LE, Franco-Cendejas R, Husain FM, Khan A, Arshad M, Morales-Espinosa R, Fernández-Presas AM, Cadet F, Wood TK, García-Contreras R. Resistance against two lytic phage variants attenuates virulence and antibiotic resistance in Pseudomonas aeruginosa. Front Cell Infect Microbiol. 2024 Jan 17;13:1280265. doi: 10.3389/fcimb.2023.1280265. Erratum in: Front Cell Infect Microbiol. 2024 Mar 06;14:1391783. doi: 10.3389/fcimb.2024.1391783. PMID: 38298921; PMCID: PMC10828002.

      Thank you for highlighting these important studies. We have incorporated the work by Majkowska-Skrobek et al. (2021), Gordillo Altamirano et al. (2021), and García-Cruz et al. (2024) into the discussion to provide further context to the evolutionary trade-offs observed in our study. The findings in these studies, which describe the cross-sensitization to antimicrobials and the loss of multidrug resistance in phage-resistant bacteria, align with our observations of trade-offs in the pspA mutant. Specifically, our results show that while the pspA mutant exhibits increased resistance to phage, heat, and polymyxins, it also experiences a decrease in immune evasion and potential virulence. These trade-offs are significant in understanding the broader consequences of developing resistance to phages and other stressors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1.

      Comments on revisions:

      I am satisfied with the revisions made by the authors, which helped clarify some points that were confusing in the initial submission.

      Thank you.

      Reviewer #2 (Public Review):

      This revised manuscript mostly addresses previous concerns by doubling down on the model without providing additional direct evidence of interactions between Srs2 and PCNA, and that "precise sites of Srs2 actions in the genome remain to be determined." One additional Srs2 allele has been examined, showing some effect in combination with rfa1-zm2. Many of the conclusions are based on reasonable assumptions about the consequences of various mutations, but direct evidence of changes in Srs2 association with PNCA or other interactors is still missing. There is an assumption that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects, which may not be the case. How SLX4 might interact with Srs2 is unclear to me, again assuming that the SLX4 defect is "surgical" - removing only one of its many interactions.

      Previous studies have already provided direct evidence for the interaction between Srs2 and PCNA through the Srs2’s PIM region (Armstrong et al, 2012; Papouli et al, 2005); we have added these citations in the text. Similarly. Srs2 associations with SUMO and Rad51 have also been demonstrated (Colavito et al, 2009; Kolesar et al, 2016; Kolesar et al., 2012), and these studies were cited in the text.

      We did not state that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects. We only assessed whether these previously characterized mutant alleles could mimic srs2∆ in rescuing rfa1-zm2 defects.

      We assessed the genetic interaction between slx4-RIM and srs2-∆PIM mutants, and not the physical interaction between the two proteins. As we described in the text, our rationale for this genetic test is based on that the reports that both slx4 and srs2 mutants impair recovery from the Mec1 induced checkpoint, thus they may affect parallel pathways of checkpoint dampening.

      One point of concern is the use of t-tests without some sort of correction for multiple comparisons - in several figures. I'm quite sceptical about some of the p < 0.05 calls surviving a Bonferroni correction. Also in 4B, which comparison is **? Also, admittedly by eye, the changes in "active" Rad53 seem much greater than 5x. (also in Fig. 3, normalizing to a non-WT sample seems odd).

      Claims made in this work were based only on pairwise comparison not multi-comparison. We have now made this point clearer in the graphs and in Method. As the values were compared between a wild-type strain and a specific mutant strain, or between two mutants, we believe that t-test is suitable for statistical analysis.

      Figure 4B, ** indicates that the WT value is significantly different from that of the slx4-RIM srs2-∆PIM double mutant and from that of srs2-∆PIM single mutant. We have modified the graph to indicate the pair-wide comparison. The 5-fold change of active Rad53 levels was derived by comparing the values between the srs2∆ PIM slx4<sup>RIM</sup>-TAP double mutant and wild-type Slx4-TAP. In Figure 3, normalization to the lowest value affords better visualization. This is rather a stylish issue; we would like to maintain it as the other reviewers had no issues.

      What is the WT doubling time for this strain? From the FACS it seems as if in 2 h the cells have completed more than 1 complete cell cycle. Also in 5D. Seems fast...

      Wild-type W303 strain has less than 90 min doubling time as shown by many labs, and our data are consistent with this. The FACS profiles for wild-type cells shown in Figures 3C, 4C, and 5C are consistent with each other, showing that after G1 cells entered the cell cycle, they were in G2 phase at the 1-hour time points, and then a percentage of the cells exited the first cell cycle by two hours.

      I have one over-arching confusion. Srs2 was shown initially to remove Rad51 from ssDNA and the suppression of some of srs2's defects by deleting rad51 made a nice, compact story, though exactly how srs2's "suppression of rad6" fit in isn't so clear (since Rad6 ties into Rad18 and into PCNA ubiquitylation and into PCNA SUMOylation). Now Srs2 is invoked to remove RPA. It seems to me that any model needs to explain how Srs2 can be doing both. I assume that if RPA and Rad51 are both removed from the same ssDNA, the ssDNA will be "trashed" as suggested by Symington's RPA depletion experiments. So building a model that accounts for selective Srs2 action at only some ssDNA regions might be enhanced by also explaining how Rad51 fits into this scheme.

      While the anti-recombinase function of Srs2 was better studied, its “anti-RPA” role in checkpoint dampening was recently described by us (Dhingra et al, 2021) following the initial report by the Haber group some time ago (Vaze et al, 2002). A better understanding of this new role is required before we can generate a comprehensive picture of how Srs2 integrates the two functions (and possibly other functions). Our current work addresses this issue by providing a more detailed understanding of this new role of Srs2.

      Single molecular data showed that Srs2 strips both RPA and Rad51 from ssDNA, but this effect is highly dynamic (i.e. RPA and Rad51 can rebind ssDNA after being displaced) (De Tullio et al, 2017). As such, generation of “deserted” ssDNA regions lacking RPA and Rad51 in cells can be an unlikely event. Rather, Srs2 can foster RPA and Rad51 dynamics on ssDNA. Additional studies will be needed to generate a model that integrates the anti-recombinase and the anti-RPA roles of Srs2.

      As a previous reviewer has pointed out, CPT creates multiple forms of damage. Foiani showed that 4NQO would activate the Mec1/Rad53 checkpoint in G1- arrested cells, presumably because there would be singlestrand gaps but no DSBs. Whether this would be a way to look specifically at one type of damage is worth considering; but UV might be a simpler way to look. As also noted, the effects on the checkpoint and on viability are quite modest. Because it isn't clear (at least to me) why rfa1 mutants are so sensitive to CPT, it's hard for me to understand how srs2-zm2 has a modest suppressive effect: is it by changing the checkpoint response or facilitating repair or both? Or how srs2-3KR or srs2-dPIM differ from rfa1-zm2 in this respect. The authors seem to lump all these small suppressions under the rubric of "proper levels of RPA-ssDNA" but there are no assays that directly get at this. This is the biggest limitation.

      CPT treatment is an ideal condition to examine how cells dampen the DNA damage checkpoint, because while most genotoxic conditions (e.g. 4NQO, MMS) induce both the DNA replication checkpoint and the DNA damage checkpoint, CPT was shown to only induced the latter (Menin et al, 2018; Minca & Kowalski, 2011; Redon et al, 2003; Tercero et al, 2003). Future studies examining 4NQO and UV conditions can further expand our understanding of checkpoint dampening in different conditions.

      We have previously provided evidence to support the conclusion that srs2 suppression of rfa1-zm is partly mediated by changing checkpoint levels (Dhingra et al., 2021). We cannot exclude the possibility that the suppression may also be related to changes of DNA repair; we have now added this note in the text.

      Regarding direct testing RPA levels on DNA, we have previously shown that srs2∆ increased the levels of chromatin associated Rfa1 and this is suppressed by rfa1-zm2 (Dhingra et al., 2021). We have now included chromatin fractionation data to show that srs2-∆PIM also led to an increase of Rfa1 on chromatin, and this was suppressed by rfa1-zm2 (new Fig. S2).

      Srs2 has also been implicated as a helicase in dissolving "toxic joint molecules" (Elango et al. 2017). Whether this activity is changed by any of the mutants (or by mutations in Rfa1) is unclear. In their paper, Elango writes: "Rare survivors in the absence of Srs2 rely on structure-specific endonucleases, Mus81 and Yen1, that resolve toxic joint-molecules" Given the involvement of SLX4, perhaps the authors should examine the roles of structure-specific nucleases in CPT survival?

      Srs2 has several roles, and its role in RPA antagonism can be genetically separated from its role in Rad51 regulation as we have shown in our previous work (Dhingra et al., 2021) and this notion is further supported by evidence presented in the current work. Srs2’s role in dissolving "toxic joint molecules” was mainly observed during BIR (Elango et al, 2017). Whether it is related to checkpoint dampening will be interesting to address in the future but is beyond of the scope of the current work that seeks to answer the question how Srs2 regulates RPA during checkpoint dampening. Similarly, determining the roles of Mus81 and Yen1 and other structural nucleases in CPT survival is a worthwhile task but it is a research topic well separated from the focus of this work.

      Experiments that might clarify some of these ambiguities are proposed to be done in the future. For now, we have a number of very interesting interactions that may be understood in terms of a model that supposes discriminating among gaps and ssDNA extensions by the presence of PCNA, perhaps modified by SUMO. As noted above, it would be useful to think about the relation to Rad6.

      Several studies have shown that Srs2’s functional interaction with Rad6 is based on Srs2-mediated recombination regulation (reviewed by (Niu & Klein, 2017). Given that recombinational regulation by Srs2 is genetically separable from the Srs2 and RPA antagonism (Dhingra et al., 2021), we do not see a strong rationale to examine Rad6 in this work, which addresses how Srs2 regulates RPA. With this said, this study has provided basis for future studies of possible cross-talks among different Srs2-mediated pathways.

      Reviewer #3 (Public Review):

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2.

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. Double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of this mutants in a RFA1 wild type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. This data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction, did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths:

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights.

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data.

      Weaknesses:

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2-Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613).

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results.

      Comments on revisions:

      In this revision, the authors adequately addressed my concerns. The only issue I see remaining is the site of Srs2 action. The authors argue in favor of gaps and against R-loops and ssDNA resulting from excessive supercoiling. The authors do not discuss ssDNA resulting from processing of onesided DSBs, which are expected to result from replication run-off after CPT damage but are not expected to provide the 3'-junction for preferred PCNA loading. Can the authors exclude PCNA at the 5'-junction at a resected DSB?

      We have now added a sentence stating that we cannot exclude the possibility that PCNA may be positioned at a 5’-junction, as this can be observed in vitro, albert that PCNA loading was seen exclusively at a 3’-junction in the presence of RPA (Ellison & Stillman, 2003; Majka et al, 2006).

      Recommendations For the authors:

      Reviewer #2 (Recommendations For the authors):

      A Bonferroni correction should be made for the multiple comparisons in several figures.

      Specific comments:

      l. 41. This is a too long and confusing sentence.

      Sentence shortened: “These data suggest that Srs2 recruitment to PCNA proximal ssDNA-RPA filaments followed by its sumoylation can promote checkpoint recovery, whereas Srs2 action is minimized at regions with no proximal PCNA to permit RPA-mediated ssDNA protection”.

      l. 60. Identify Ddc2 and Mec1 as ATRIP and ATR.

      Done.

      l. 125 "fails to downregulate RPA levels on chromatin and Mec1-mediated DDC..." fails to downregulate RPA and fails to reduce Mec1-mediated DDC?

      Sentence modified: “fails to downregulate both the RPA levels on chromatin and the Mec1-mediated DDC”

      l. 204 "consistent with the notion that Srs2 has roles beyond RPA regulation"... What other roles? It's stripping of Rad51? Removing toxic joint molecules? Something else?

      Sentence modified: “consistent with the notion that Srs2 has roles beyond RPA regulation, such as in Rad51 regulation and removing DNA joint molecules”.

      l. 249 "Significantly, srs2-ΔPIM and -3KR increased the percentage of rfa1-zm2 cells transitioning into the G1 phase" No. Just back to normal. As stated in l. 258: "258 We found that srs2-ΔPIM and srs2-3KR mutants on their own behaved normally in the two DDC assays described above." All of these effects are quite small.

      Sentence modified: “Compared with rfa1-zm2 cells, srs2-∆PIM rfa1-zm2 and srs2-3KR rfa1-zm2 cells showed increased percentages of cells transitioning into the G1 phase”.

      l. 468 "Our previous work has provided several lines of evidence to support that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). What evidence? See my comment above about not having both proteins removed at the same time.

      We have addressed this point in our initial rebuttal and some key points are summarized below. In our previous report (Dhingra et al., 2021), we provided several lines of evidence to support the conclusion that Rad51 is not relevant to the Srs2-RPA antagonism. For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, rad51∆ did not affect the hyper-checkpoint phenotype of srs2∆. In contrast, rfa1-zm1/zm2 have the opposite effects, that is, rfa1zm1/zm2 suppressed the hyper-checkpoint, but not the hyper-recombination, phenotype of srs2∆ cells. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the ATPase dead allele of Srs2 (srs2K41A). For example, rfa1-zm2 rescued hyper-checkpoint and CPT sensitivity of srs2-K41A cells, while rad51∆ had neither effect. These and other data described by Dhingra et al (2021) suggest that Srs2’s effects on checkpoint vs. recombination can be separated genetically. Consistent with our conclusion summarized above, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Fig. 2D). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2RPA antagonism.

      l. 525 "possibility, we tested the separation pin of Srs2 (Y775), which was shown to enables its in vitro helicase activity during the revision of our work..." ?? there was helicase activity during the revision of your work? Please fix the sentence.

      Sentence modified: “we tested the separation pin of Srs2 (Y775). This residue was shown to be key for the Srs2’s helicase activity in vitro in a report that was published during the revision of our work (Meir et al, 2023).”

      Fig. 3. "srs2-ΔPIM and -3KR allow better G1 entry of rfa1-zm2 cells." is it better entry or less arrest at G2/M? One implies better turning off of a checkpoint, the other suggests less activation of the checkpoint.

      This is a correct statement. For all strains examined in Figure 3, cells were seen in G2/M phase after 1-hour CPT treatment, suggesting proper arrest.

      References:

      Armstrong AA, Mohideen F, Lima CD (2012) Recognition of SUMO-modified PCNA requires tandem receptor motifs in Srs2. Nature 483: 59-63

      Colavito S, Macris-Kiss M, Seong C, Gleeson O, Greene EC, Klein HL, Krejci L, Sung P (2009) Functional significance of the Rad51-Srs2 complex in Rad51 presynaptic filament disruption. Nucleic Acids Res 37: 6754-6764.

      De Tullio L, Kaniecki K, Kwon Y, Crickard JB, Sung P, Greene EC (2017) Yeast Srs2 helicase promotes redistribution of single-stranded DNA-bound RPA and Rad52 in homologous recombination regulation. Cell Rep 21: 570-577

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E, Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin. Proc Natl Acad Sci U S A 118: e2020185118

      Elango R, Sheng Z, Jackson J, DeCata J, Ibrahim Y, Pham NT, Liang DH, Sakofsky CJ, Vindigni A, Lobachev KS et al (2017) Break-induced replication promotes formation of lethal joint molecules dissolved by Srs2. Nat Commun 8: 1790

      Ellison V, Stillman B (2003) Biochemical characterization of DNA damage checkpoint complexes: clamp loader and clamp complexes with specificity for 5' recessed DNA. PLoS Biol 1: E33

      Kolesar P, Altmannova V, Silva S, Lisby M, Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction. J Biol Chem 291: 7594-7607.

      Kolesar P, Sarangi P, Altmannova V, Zhao X, Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation. Nucleic Acids Res 40: 7831-7843.

      Majka J, Binz SK, Wold MS, Burgers PM (2006) Replication protein A directs loading of the DNA damage checkpoint clamp to 5'-DNA junctions. J Biol Chem 281: 27855-27861

      Meir A, Raina VB, Rivera CE, Marie L, Symington LS, Greene EC (2023) The separation pin distinguishes the pro- and anti-recombinogenic functions of Saccharomyces cerevisiae Srs2. Nat Commun 14: 8144

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP, Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after Topoisomerase poisoning. EMBO Rep 19: e45535

      Minca EC, Kowalski D (2011) Replication fork stalling by bulky DNA damage: localization at active origins and checkpoint modulation. Nucleic Acids Res 39: 2610-2623

      Niu H, Klein HL (2017) Multifunctional roles of Saccharomyces cerevisiae Srs2 protein in replication, recombination and repair. FEMS Yeast Res 17: fow111

      Papouli E, Chen S, Davies AA, Huttner D, Krejci L, Sung P, Ulrich HD (2005) Crosstalk between SUMO and ubiquitin on PCNA is mediated by recruitment of the helicase Srs2p. Mol Cell 19: 123-133

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF, Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage. EMBO Rep 4: 678-684

      Tercero JA, Longhese MP, Diffley JFX (2003) A central role for DNA replication forks in checkpoint activation and response. Mol Cell 11: 1323-1336

      Vaze MB, Pellicioli A, Lee SE, Ira G, Liberi G, Arbel-Eden A, Foiani M, Haber JE (2002) Recovery from checkpointmediated arrest after repair of a double-strand break requires Srs2 helicase. Mol Cell 10: 373-385

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I In this manuscript, Jiao D et al reported the induction of synthetic lethal by combined inhibition of anti-apoptotic BCL-2 family proteins and WSB2, a substrate receptor in CRL5 ubiquitin ligase complex. Mechanistically, WSB2 interacts with NOXA to promote its ubiquitylation and degradation. Cancer cells deficient in WSB2, as well as heart and liver tissues from Wsb2-/- mice exhibit high susceptibility to apoptosis induced by inhibitors of BCL-2 family proteins. The anti-apoptotic activity of WSB2 is partially dependent on NOXA.

      Overall, the finding, that WSB2 disruption triggers synthetic lethality to BCL-2 family protein inhibitors by destabilizing NOXA, is rather novel. The manuscript is largely hypothesis-driven, with experiments that are adequately designed and executed. However, there are quite a few issues for the authors to address, including those listed below.

      Specific comments:

      (1) At the beginning of the Results section, a clear statement is needed as to why the authors are interested in WSB2 and what brought them to analyze "the genetic co-dependency between WSB2 and other proteins".

      We thank the reviewer for raising this important point. We agree that a clear rationale should be provided at the beginning of the Results section. As reported in previous studies [Ref: 1, 2, 3], strong synthetic interactions have been observed between WSB2 and several mitochondrial apoptosis-related factors, including MCL-1, BCL-xL, and MARCH5. We have referenced these findings in the Discussion section. Motivated by these studies, we became interested in the role of WSB2 and aimed to investigate the specific mechanisms underlying its synthetic lethality with anti-apoptotic BCL-2 family members. We will revise the beginning of the Results section to clearly state this rationale.

      (1) McDonald, E.R., 3rd et al. Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell 170, 577-592 e510 (2017).

      (2) DeWeirdt, P.C. et al. Genetic screens in isogenic mammalian cell lines without single cell cloning. Nat Commun 11, 752 (2020).

      (3) DeWeirdt, P.C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat Biotechnol 39, 94-104 (2021).

      (2) In general, the biochemical evidence supporting the role of WSB2 as a SOCS box-containing substrate-binding receptor of CRL5 E3 in promoting NOXA ubiquitylation and degradation is relatively weak. First, since NOXA binds to WSB2 on its SOCS box, which consists of a BC box for Elongin B/C binding and a CUL5 box for CUL5 binding, it is crucial to determine whether the binding of NOXA on the SOCS box affects the formation of CRL5WSB2 complex. The authors should demonstrate the endogenous binding between NOXA and the CRL5WSB2 complex. Additionally, the authors may also consider manipulating CUL5, SAG, or ElonginB/C to assess if it would affect NOXA protein turnover in two independent cell lines.

      We thank the reviewer for raising this important point. To determine whether endogenous NOXA binds to the intact CRL5<sup>WSB2</sup> complex, we performed co-immunoprecipitation assays using an antibody against NOXA. Indeed, NOXA co-immunoprecipitated with all subunits of the CRL5<sup>WSB2</sup> complex (Figure 2—figure supplement 1D), suggesting that NOXA binding to WSB2 does not disrupt interactions between WSB2 and the other CRL5 subunits. Moreover, depletion of CRL5 complex components (RBX2/SAG, CUL5, ELOB, or ELOC) through siRNAs in C4-2B or Huh-7 cells also resulted in a marked increase in NOXA protein levels.

      Second, in all the experiments designed to detect NOXA ubiquitylation in cells, the authors utilized immunoprecipitation (IP) with FLAG-NOXA/NOXA, followed by immunoblotting (IB) with HA-Ub. However, it is possible that the observed poly-Ub bands could be partly attributed to the ubiquitylation of other NOXA binding proteins. Therefore, the authors need to consider performing IP with HA-Ub and subsequently IB with NOXA. Alternatively, they could use Ni-beads to pull down all His-Ub-tagged proteins under denaturing conditions, followed by the detection of FLAG-tagged NOXA using anti-FLAG Ab. The authors are encouraged to perform one of these suggested experiments to exclude the possibility of this concern. Furthermore, an in vitro ubiquitylation assay is crucial to conclusively demonstrate that the polyubiquitylation of NOXA is indeed mediated by the CRL5WSB2 complex.

      We appreciate the reviewer for raising these important considerations regarding our ubiquitylation assays. We fully acknowledge the reviewer's concern that classical ubiquitination assays could potentially detect ubiquitination of proteins interacting with NOXA. However, we would like to clarify that our experimental conditions effectively mitigate this issue. Specifically, cells were lysed using buffer containing 1% SDS followed by boiling at 105°C for 5 minutes. These rigorous denaturing conditions ensure disruption of non-covalent protein interactions, thereby effectively eliminating the possibility of detecting ubiquitination signals from NOXA-associated proteins.

      Regarding the suggestion to perform an in vitro ubiquitination assay, we agree this experiment would indeed provide additional evidence. However, due to significant technical complexities associated with reconstituting CRL5-based E3 ubiquitin ligase activity in vitro—which would require the expression and purification of at least six recombinant proteins—such experiments are rarely performed in this context. Furthermore, NOXA is uniquely localized as a membrane protein on the mitochondrial outer membrane, posing additional significant challenges for protein expression and purification. Given the robustness of our current in vivo ubiquitylation assay under stringent denaturing conditions, we believe our existing data sufficiently and conclusively demonstrate NOXA ubiquitination mediated by the CRL5<sup>WSB2</sup> complex.

      (3) In their attempt to map the binding regions between NOXA and WSB2, the authors utilized exogenous proteins of both WSB2 and NOXA. To strengthen their findings, it would be more convincing to perform IP with exogenous wt/mutant WSB2 or NOXA and subsequently perform IB to detect endogenous NOXA or WSB2, respectively. Additionally, an in vitro binding assay using purified proteins would provide further evidence of a direct binding between NOXA and WSB2.

      We thank the reviewer for raising these important issues. In response to the reviewer’s suggestion to map the binding regions between NOXA and WSB2 more convincingly, we have indeed performed semi-endogenous Co-IP assays, which yielded results consistent with our exogenous protein experiments (Figure 3—figure supplement 1A, B). Concerning the recommendation to further validate direct interaction using purified recombinant proteins, we encountered substantial technical difficulties in obtaining pure and soluble recombinant WSB2 protein. Additionally, given that NOXA is an outer mitochondrial membrane protein and the interaction occurs on mitochondria, we believe that an in vitro binding assay may have limited physiological relevance. We hope the reviewer can appreciate these practical challenges and our current evidence supporting the strong interaction between NOXA and WSB2.

      Reviewer #2 (Public Review):

      Summary:

      Exploring the DEP-MAP database and two drug-screen databases, the authors identify WSB2 as an interactor of several BCL2 proteins. In follow-up experiments, they show that CRL5/WSB2 controls NOXA protein levels via K48 ubiquitination following direct protein-protein interaction, and cell death sensitivity in the context of BH3 mimetic treatment, where WSB2 depletion synergizes with drug treatment.

      Strengths:

      The authors use a set of orthogonal methods across different model cell lines and a new WSB2 KO mouse model to confirm their findings. They also manage to correlate WSB2 expression with poor prognosis in prostate and liver cancer, supporting the idea that targeting WSB2 may sensitize cancers for treatment with BH3 mimetics.

      Weaknesses:

      The conclusions drawn based on the findings in cancer patients are very speculative, as regulation of NOXA cannot be the sole function of CRL5/WSB2 and it is hence unclear what causes correlation with patient survival. Moreover, the authors do not provide a clear mechanistic explanation of how exactly higher levels of NOXA promote apoptosis in the absence of WSB2. This would be important knowledge, as usually high NOXA levels correlate with high MCL1, as they are turned over together, but in situations like this, or loss of other E3 ligases, such as MARCH, the buffering capacity of MCL1 is outrun, allowing excess NOXA to kill (likely by neutralizing other BCL2 proteins it usually does not bind to, such as BCLX). Moreover, a necroptosis-inducing role of NOXA has been postulated. Neither of these options is interrogated here.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2J. The authors showed that "the mRNA levels of NOXA were even reduced in WSB2-KO cells compared to parental cells". What is the possible mechanism? This point should at least be discussed.

      We thank the reviewer for raising these important issues. The underlying mechanisms for the significantly lower mRNA levels of NOXA following the KO of WSB2 are not fully understood at present. However, we propose that this could represent a form of negative feedback regulation at the level of gene expression. Specifically, when the protein levels of BNIP3/3L rise sharply, it may activate mechanisms that suppress their own mRNA synthesis or stability, serving as a buffering system to prevent further protein accumulation. Such negative feedback loops may be critical for maintaining cellular homeostasis and avoiding excessive protein production. Moreover, this phenomenon is frequently observed in other studies investigating substrates targeted by E3 ubiquitin ligases for degradation. We have elaborated on this point in the Discussion section.

      (2) Figure 2M. A previous study has clearly demonstrated that NOXA is subjected to ubiquitylation and degradation by CRL5 E3 ligase (PMID: 27591266). This paper should be cited. Also, in that publication, NOXA ubiquitylation is via the K11 linkage, not the K48 linkage. The authors should include K11R mutant in their assay.

      We thank the reviewer for raising this important issue. We thank the reviewer for suggesting the relevant reference (PMID: 27591266), which we have now cited accordingly. Additionally, we would like to clarify that our new in vivo ubiquitination assays included the K11R and K11-only ubiquitin mutants, and our data demonstrate that WSB2-mediated NOXA ubiquitination indeed involves the K11 linkage ubiquitination(Figure 2—figure supplement 1E).

      (3) Figure 3H, J. The authors stated, "By mutating these lysine residues to arginine, we found that WSB2-mediated NOXA ubiquitination was completely abolished". Which one of the three lysine residues is playing the dominant role?

      We thank the reviewer for raising this important issue. To address this, we generated FLAG-NOXA mutants individually substituting lysine residues K35, K41, and K48 with arginine. In vivo ubiquitination assays demonstrated that lysine 48 (K48) is the predominant residue responsible for WSB2-mediated NOXA ubiquitination (Figure 3—figure supplement 1C).

      (4) Figure 3N. The authors need to show that the fusion peptide containing C-terminal NOXA peptide competitively inhibits the interaction between endogenous WSB2 and NOXA and extends the protein half-life of NOXA, leading to NOXA accumulation.

      We sincerely thank the reviewer for raising these important issues. As suggested, we investigated whether the fusion peptide containing the C-terminal NOXA sequence competitively disrupts the interaction between endogenous WSB2 and NOXA, subsequently influencing NOXA stability. Our results demonstrated that treatment with this fusion peptide indeed significantly reduced the endogenous interaction between WSB2 and NOXA (Figure 3—figure supplement 1D). Furthermore, we observed that the peptide dose-dependently increased endogenous NOXA protein levels and prolonged its protein half-life, thereby resulting in the accumulation of NOXA (Figure 3N; Figure 3—figure supplement 1E, F). These findings collectively indicate that the fusion peptide competitively inhibits the WSB2-NOXA interaction, stabilizes NOXA protein, and enhances its accumulation.

      (5) Figure 4. a) It would be better to investigate whether WSB2 knockdown can sensitize cancer cells to the treatment with ABT-737 or AZD5991, evidenced by a decrease in both IC50 values and clonogenic survival rates and whether such sensitization is dependent on NOXA. b) The authors need to show the levels of cleaved caspase-3/7/9 and the percentages of apoptotic cells in shNC cells upon silencing of WSB2 in Figure 4A-F. c) It will be more convincing to repeat the experiment to show synthetic lethality by WSB2 disruption and MCL-1 inhibitor AZD5991 treatment using another cell line, such as WSB2-deficient Huh-7 cells in Figure 4 I&J.

      We sincerely thank the reviewer for these valuable and constructive suggestions. Regarding point (a): We believe that our current Western blot and flow cytometry data (Figure 4G–L) have already provided strong evidence that WSB2 depletion enhances apoptosis in response to ABT-737 and AZD5991. Therefore, we consider that additional IC50 and clonogenic survival assays, while informative, may not be essential for supporting our conclusion. Furthermore, as shown in Figure 5A–F, we found that silencing NOXA largely, though not completely, reversed the enhanced apoptosis triggered by these inhibitors in WSB2-deficient cells, suggesting that the sensitization effect is at least partially dependent on NOXA.

      Regarding point (b): We have shown that WSB2 knockout alone had no impact on the levels of cleaved caspase-3/7/9 or the percentages of apoptotic cells in Huh-7 and C4-2B cells (Figure 4G-L and Figure 4—figure supplement 1A-D), indicating that WSB2 loss does not induce apoptosis on its own under basal conditions.

      Regarding point (c): We appreciate the reviewer’s suggestion and have now repeated the experiment in WSB2 knockout Huh-7 cells. The new results further support the synthetic lethality between WSB2 loss and AZD5991 treatment (Figure 4—figure supplement 1C, D).

      (6) Figure 5A/C/E. The effect of siNOXA is minor, if any, for cleavage of caspases. The same thing for Figure 6F/H.

      We appreciate the reviewer’s insightful observation regarding the relatively modest effect of shNOXA on caspase cleavage in Figures 5A/C/E and Figures 6F/H. Indeed, we acknowledge that the reduction in caspase cleavage following NOXA knockdown is moderate. However, consistent with our discussions in the manuscript, NOXA knockdown significantly—but not completely—rescued the increased apoptosis observed in WSB2-deficient cells treated with BCL-2 family inhibitors. This suggests that while NOXA plays a notable role, additional mechanisms or unidentified targets may also be involved in WSB2-mediated regulation of apoptosis.

      (7) Figure 5 I&J. The authors may consider performing IHC staining, immunofluorescence, or WB analysis to show the levels of NOXA and cleaved caspases or PARP in xenograft tumors. This would provide in vivo evidence of significant apoptosis induction resulting from the co-administration of ABT-737 and R8-C-terminal NOXA peptide.

      We appreciate the reviewer's thoughtful suggestion regarding additional immunohistochemical or immunofluorescence analyses in xenograft tumors. However, due to current limitations in available antibodies suitable for reliable detection of NOXA by IHC and IF, we are unable to perform these experiments. We greatly appreciate the reviewer's understanding of this technical constraint. Nevertheless, our existing data collectively supports the conclusion that the combination of ABT-737 and R8-C-terminal NOXA peptide significantly enhances apoptosis in vivo.

      (8) Figure 7. Does an inverse correlation exist between the protein levels of WSB2 and NOXA in RPAD or LIHC tissue microarrays? On page 12, in the first paragraph, Figure 7M-P was cited incorrectly.

      We sincerely thank the reviewer for raising this important issue. As mentioned above, due to current limitations regarding the availability of suitable antibodies that can reliably detect NOXA by IHC, we regret that it is not feasible to experimentally address this question at this time.

      Additionally, we have carefully corrected the citation error involving Figure 7M-P on page 12, as pointed out by the reviewer.

      (9) Figure S1D. BCL-W levels were reduced upon WSB2 overexpression, which should be acknowledged.

      We sincerely thank the reviewer for raising this important issue. We acknowledge that BCL-W protein levels were slightly reduced upon WSB2 overexpression in Figure S1D. However, this effect is distinct from the pronounced reduction observed in NOXA protein levels. We have revised the manuscript to clarify this point. Additionally, we recognize that transient overexpression systems may occasionally lead to non-specific or artifactual changes. Our exogenous expression and co-immunoprecipitation experiments did not support an interaction between BCL-W and WSB2. Therefore, the observed reduction of BCL-W under these conditions may not reflect a physiologically relevant regulation.

      (10) Figure S4. Given WSB2 KO mice are viable; the authors may consider determining whether these mice are more sensitive to radiation-induced tissue damage or but more resistant to radiation-induced tumorigenesis?

      We sincerely thank the reviewer for this insightful and biologically meaningful suggestion. We agree that investigating the potential role of WSB2 in radiation-induced tissue damage and tumorigenesis would be of great interest. However, conducting such experiments requires access to specialized irradiation facilities, which are currently unavailable to us. Nevertheless, we recognize the value of this line of investigation and plan to explore it in our future studies.

      (11) All data were displayed as mean{plus minus}SD. However, for data from three independent experiments, it is more appropriate to present the results as mean{plus minus}SEM, not mean{plus minus}SD.

      We sincerely thank the reviewer for highlighting this important issue. In line with the reviewer's suggestion, we have revised the manuscript accordingly and now present data from three independent experiments as mean ± SEM.

      (12) The figure legends require careful review: i) The low dose of ABT-199 (Figure 6H) and the dose of ABT-199 used in Figure 6I are missing. ii) The legends for Figure S1D-E are incorrect. iii) The name of the antibody in the legend of Figure S3C is incorrect.

      We sincerely thank the reviewer for raising these important issues. We have carefully corrected all the errors mentioned. In addition, we have thoroughly reviewed the manuscript to prevent similar errors.

      Reviewer #2 (Recommendations For The Authors):

      The authors focus on NOXA, after initially identifying WSB2 to interact with several BCL2 proteins. The rationale behind this is that WSB2 depletion or overexpression affects NOXA levels, but none of the other BCL2 proteins tested, as stated in the text. Yet, BCLW is also depleted upon overexpression of WSB2 (Supplementary Figure 1). How does this phenomenon relate to the sensitization noted, is BCL-W higher in WSB2 KO cells? It does not seem so though. This warrants discussion.

      We appreciate the reviewer for raising this important issue. Our results showed that overexpression of WSB2 markedly reduced NOXA levels, while the levels of other BCL-2 family proteins remained unaffected or minimally affected, such as BCL-W (Figure 2—figure supplement 1A). Furthermore, depletion of WSB2 through shRNA-mediated KD or CRISPR/Cas9-mediated KO in C4-2B cells or Huh-7 cells led to a marked increase in the steady-state levels of endogenous NOXA, without affecting other BCL-2 family proteins examined, included BCL-W (Figure 2A-C, Figure 2—figure supplement 2A, B).

      If WSB2 depletion does not affect MCL1 levels, how does excess NOXA actually kill? Does it bind to any (other) prosurvival proteins under conditions of WSB2 depletion? Is the MCL1 half-life changed?

      We appreciate the reviewer for raising this important point. NOXA is a BH3-only protein known to promote apoptosis primarily by binding to and neutralizing anti-apoptotic BCL-2 family members, especially MCL-1, via its BH3 domain. It can inhibit MCL-1 either through competitive binding or by facilitating its ubiquitination and subsequent proteasomal degradation. In our system, the total protein levels of MCL-1 remained unchanged in WSB2 knockout cells, suggesting that NOXA may not be promoting apoptosis through enhanced MCL-1 degradation. Instead, we speculate that the accumulation of NOXA in WSB2-deficient cells enhances apoptosis by sequestering MCL-1 through direct binding, thereby freeing pro-apoptotic effectors such as BAK and BAX. In line with our observations, Nakao et al. reported that deletion of the mitochondrial E3 ligase MARCH5 led to a pronounced increase in NOXA expression, while leaving MCL-1 protein levels unchanged in leukemia cell lines (Leukemia. 2023 ;37:1028-1038., PMID: 36973350).

      Additionally, NOXA has been reported to interact with other anti-apoptotic proteins, including BCL-XL. It is therefore possible that under conditions of WSB2 depletion, excess NOXA may also bind to BCL-XL and relieve its inhibition of BAX/BAK, further contributing to apoptosis. Future experiments assessing NOXA binding partners in WSB2-deficient cells would help clarify this mechanism.

      I think some initial insights into the mechanism underlying the sensitization would add a lot to this study. Is there a role of BFL1/A1 in any of these cell lines, as it can also rather selectively bind to NOXA and is sometimes deregulated in cancer?

      We appreciate the reviewer for raising this important issue. While BFL1/A1 is indeed another anti-apoptotic BCL-2 family member that can selectively bind to NOXA and has been implicated in cancer, our study primarily focuses on the WSB2-NOXA axis. However, given its potential involvement in apoptosis regulation, it would be an interesting direction for future studies to explore whether BFL1/A1 contributes to NOXA-mediated sensitization in specific cellular contexts.

      Otherwise, this is a very nice and convincing study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor coreceptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco knockout. Furthermore, we did not find any developmental differences between wildtype and knockout caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orcoexpressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2024). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further clarify this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however we do believe that this would be an interesting avenue for further research.

      (5)Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of this manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety III greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. But indeed, we did not include ITCs, based on information from the EAG results in Figures 2A & 2B. Butterfly antennae did not respond strongly to ITCs, so we did not include ITCs in the larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as “important” by VIP scores in the chemical analyses. In the EAG results of Plutella xylostella (Liu et al., 2020), moths responded well to a few ITCs, the tested ITCs in our study are actually adopted from this study except for those that were not available to us. However, butterflies did not show a strong response to the tested ITCs; therefore, we did not include ITCs because we expected that Pieris brassicae caterpillars are not likely to show good responses to ITCs. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      Figure 3D shows that WT caterpillars prefer infested plants without parastioids to infested plants with parasitoids. In addition, we observed that caterpillars move frequently between different leaves. Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from (parasitoid-exposed) conspecifics via their spit or faeces to avoid parts of the plant potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plant parts that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it more clear.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract

      Line 24: "optimal food plant" should be changed to "optimal food plants"

      Thank you for the suggestion, we will revise it.

      (2) Introduction

      Lines 44-46: The sentence should be rephrased.

      Thank you for the suggestion, we will revise it.

      Line 50: "are" should be changed to "is".

      Thank you for the suggestion, we will revise it.

      Lines 57 and 58: Please provide the Latin names of "brown planthoppers" and "striped stem borer".

      Thank you for the suggestion, we will revise it.

      Line 85: "investigate the influence of odor-guided behavior by this primary herbivore on the next trophic levels"; similarly, Line 160: "investigate if caterpillars could locate the optimal host-plant when supplied with differently treated plants". These sentences are not very accurate in describing the relevant experiments. A: Thank you for the suggestion, we will revise them.

      Reviewer #2 (Recommendations for the authors):

      (1) L53 Remove the "the" from "Under the strong selection pressure"

      Thank you for the suggestion, we will revise it.

      (2) L80 I suggest adding a reference for the spitting behaviour, e.g. Muller et al 2003.

      Thank you for the suggestion, we will add it.

      (3) L89 establishing a homozygous KO insect colony.

      Thank you for the suggestion, we will revise it.

      (4) L107 perhaps this goes against the journal style but I always like to see acronyms explained the first time they are used.

      Thank you for the suggestion, we will try to make it more understandable.

      (5) L146-148 sentence difficult to read - consider rephrasing.

      Thank you for the suggestion, we will revise it.

      (6) L230 do you mean still produce? Rather than still reproduce?

      Thank you for the suggestion, we will revise it.

      (7) L233 missing an and before "a greater vulnerability to the parasitoid wasp".

      Thank you for pointing this out, we will revise it.

      (8) L238 malfunctional is a strange word choice.

      Thank you for pointing this out, we will revise it.

      (9) L181 - can the authors confirm that this lower survival was due to parasitism by the wasps?

      This question is similar to Q(7) of Reviewer 1, so we quote our answer for Q(7) here:

      When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasp (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (10) L474 - has it been tested if wasps still behave similarly after their ovipositor has been removed?

      Thank you for pointing out this issue. We did not strictly compare if disarmed and untreated wasps have similar behaviors. However, we did observe if disarmed wasps can actively move or fly after recovering from anesthesia before releasing into a cage, otherwise we would replace with another active one.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We have added a table as supplementary figure 3 that shows a comparison of all candidates. While there are differences in both proteomes, components such as ZO proteins and the endocytosis machinery are clearly conserved.

      No description of how mass spectrometry was done and what type of validation was done.

      We have contacted the mass spec facility we worked with and added a paragraph explaining the mass spec. procedure in the material and methods section.

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary. This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina). This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins.

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa. The background mainly originates from these two proteins. Therefore, it is easy to distinguish specific hits from non-specific background.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 151. There are many gap junctions at which Cx35b is not colocalized with Cx34.7.

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript. We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include a comprehensive summary of the results from the quantitative proteomics analyses, such as the number of proteins detected in each species and the number of proteins associated with each GO term. Additionally, a clear figure or table highlighting the specific proteins conserved between zebrafish and mice would strengthen the evidence for evolutionary conservation of proteins at electrical synapses.

      We have added the raw data we received from our mass spec facility including a comparison of all the candidates for different species. Supplementary figure 3.

      (2) A more detailed description of the number of experimental and/or technical replicates would improve the technical rigor of the study. For example, what was the rationale for using different log2 fold-change cutoffs in mice versus zebrafish? Are the replicates consistent in terms of protein enrichment?

      We have added raw data from individual experiments as a supplement (Excel spreadsheet). We have two replicates from zebrafish and two from mice. The first experiment in mice was conducted with fewer retinas and a different promoter (human synapsin promoter) and didn’t yield nearly as many candidates. We are currently running a third experiment with 35 mouse retinas which will most likely detect more candidates as we have identified currently. We can update the proteome in this paper once the analysis is complete. It is not feasible to conduct these experiments with multiple replicates at the same time, since the number of animals that have to be used is simply too high, especially since very specific genotypes are required that are difficult obtain.

      (3) It would be interesting to determine whether there are differences in the presence of candidate proteins between AII-AII gap junctions and AII-cone bipolar cell gap junctions. Given that the subcellular localization of AII-AII gap junctions differs from that of AII-cone bipolar cell gap junctions (with most AII-AII gap junctions located below AII-cone ones), histological validations of the proteins shown in Figure 6 can be repeated for AII-AII gap junctions. This would help reveal similarities or differences in the protein compositions of these two types of gap junctions.

      Thank you for this suggestion. We had similar plans. However, we realized that homologous gap junctions are difficult to recognize with GFP. The dense GFP labeling in the proximal IPL, where AII-AII gap junctions are formed, does not allow us to clearly trace the location of individual dendrites from different cells. Detecting AII-AII gap junctions would require intracellular dye Injections of neighboring AII cells. Unfortunately, we don’t have a set up that would allow this. Bipolar cell terminals, on the contrary, are a lot easier to detect with markers such as SCGN, which is why we decided to focus on AII/ONCB gap junctions.

      (4) In Figures 1 and 2, it would be helpful to clarify in the figure legends whether the proteins in the interaction networks represent all detected proteins or only those selected based on log2 fold-change or other criteria.

      Thank you for this suggestion! We have added a description in lines 643 and 662.

      (5) In Figure 1A (bottom panel), please include a negative control for the Neutravidin staining result from the non-labeling group.

      We only tested the biotinylation for wild type retinas in cell lysates and western blots as shown in figure 1C, which shows an entirely different biotinylation pattern.

      (6) In Figure 2B, please include the results of Neutravidin staining for both the labeling and non-labeling groups.

      Same comment: We see the differences in the biotinylation pattern on western blots, which is distinct for Cx36-EGFP and wild type retinas, although both genotypes were injected with the same AAV construct and the same dose of biotin. We hope that this provides sufficient evidence for the specificity of our approach.

      (7) In Figure 5B, the sizes of multiple proteins detected by Western blotting are inconsistent and confusing. For example, the size of Cx36 in the "FLAG-SJ2BP" panel differs from that in the other three panels. Additionally, in the "Myc-SIPA1L3+" panel, the size of SIPA1l3 appears different between the input and IP conditions.

      Thank you for pointing this out! The differences in the molecular weight can be explained by dimerization. We have indicated the position of the dimer and the monomer bands with arrows. Especially, when larger amounts of Cx36 are coprecipitated Cx36 preferentially occurs as a dimer. This can also be seen in our previous publication:

      S. Tetenborg et al., Regulation of Cx36 trafficking through the early secretory pathway by COPII cargo receptors and Grasp55. Cellular and Molecular Life Sciences 81, 1-17 (2024). Figure 1D

      The band that occurs above 150kDa in the SIPA1L3 input is most likely a non-specific product. The specific band for SIPA1L3 can be seen in the IP sample, which has the appropriate molecular weight. We often see much better immuno reactivity for the protein of interest in IP samples, because the protein is concentrated in these experiments which facilitates its detection.

      (8) How specific are the antibodies used for validating the proteins in this study? Given that many proteins, such as EPS15l1, HIP1R, SNAP91, GPrin1, SJ2BP, Syt4, show broad distribution in the IPL (Figure 3B, 4A, 6D), it is important to validate the specificity of these antibodies. Additionally, including negative controls in the histological validation would strengthen the reliability of the results.

      We carefully selected the antibodies based on western blot data, that confirmed that each antibody detected an antigen of appropriate size. Moreover, the distribution of the proteins mentioned is consistent with function of each protein described in the literature. EPS15L1 and GPrin1 for instance are both membrane-associated, which is evident in Hek cells. Figure 5C.

      A true negative control would require KO tissue and we don’t think that this is feasible at this point.

      (9) In Figure 7F, the model could be improved by highlighting which components may be conserved between zebrafish and mice, as well as which components are conserved between the AII-AII junction and AII-cone bipolar cell junction?

      Thank you for this suggestion. However, we don’t think that this is necessary as our study primarily focuses on the AII amacrine cell.

      Currently we are unable to distinguish differences in the composition of AII-AII and AII-ONCB junctions as described above.

      (10) Are there any functional measurements that could support the conclusion that "loss of Cx36 resulted in a quantitative defect in the formation of electrical synapse density complex"?

      The loss of electrical synapse density proteins is shown by these immunostaining comparisons. Functional measurements necessarily depend on the function of the electrical synapse itself, which is gone in the case of the Cx36 KO. It is not clear that a different functional measurement can be devised.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be very helpful if there were page and line numbers on the manuscript.

      Line and page numbers have been added.

      (2) Typos in the 3rd paragraph, the sentence 'which is triggered by the influx of Calcium though non-synaptic NMDA...'

      Should it read '... Calcium THROUGH non-synaptic NMDA'?

      We have corrected this typo.

      (3) Figure 1B: please add a description of the top panels, 'Cx36 S293'.

      A description of the top panels has been added to the figure legend in line. Line 639.

      (4) Figure 1C: what do the arrows indicate?

      We apologize for the confusion. The arrows in the western blot indicate the position of the Cx35-V5-TurboID construct, which can be detected with streptavidin-HRP and the V5 antibody. We have added a description for these arrows to the figure legend. See line 641.

      (5) Related to the point in the 'Weakness', there are some descriptions of how well some of the gap junction-associated proteins colocalize with Cx36 in immunostaining. For example, 'In comparison to the scaffold proteins, however, the colocalization of Cx36 with each of these endocytic components, was clearly less frequent and more heterogenous, which appears to reflect different stages in the life cycle of Cx36' and 'All of these proteins showed considerable colocalization with Cx36 in AII amacrine cell dendrites'. It would be nice to see quantification data to support these claims.

      Thank you for this suggestion. We have added a colocalization analysis to figure 3 (C & D). We quantified the colocalization for the endocytosis proteins Eps15l1 and Hip1r. This quantification included a flipped control to rule out random overlap. For both proteins we confirmed true colocalization (Figure 3D).

      (6) In Figure 5B, it would be helpful if there were arrows or some kind in western blottings to indicate which bands are supposed to be the targeted proteins.

      We have added arrows in IP samples to indicate bands representing the corresponding protein.

      (7) In the sentence including 'for the PBM of Cx36, as it is the case for ZO-1', what is PBM?

      The PBM means PDZ binding motif. We have added an explanation for this abbreviation in line 244.

      (8) Please add a description of the Cx35b promoter construct in the Method section.

      The Cx35b Promoter is a 6.5kb fragment. We will make the clone available via Addgene to ensure that all details of the clone can be accessed via snapgene or alternative software.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Formins are complex proteins with multiple effects on actin filament assembly, including nucleation, capping with processive elongation, and bundling. Determining which of these activities is important for a given biological process and normal cellular function is a major challenge.

      Here, the authors study the formin FHOD3L, which is essential for normal sarcomere assembly in muscle cells. They identify point mutants of FHOD3L in which formin nucleation and elongation/bundling activities are functionally separated. Expression of these mutants in neonatal rat ventricular myocytes shows that the control of actin filament elongation by formin is the major activity required for the normal assembly of functional sarcomeres.

      Strengths:

      The strength of this work is to combine sensitive biochemical assays with excellent work in neonatal rat ventricular myocytes. This combination of approaches is highly effective for analyzing the function of proteins with multiple activities in vitro.

      Weaknesses:

      FHOD3L does not seem to be the easiest formin to study because of its relatively weak nucleation activity and the short duration of capping events. This difficulty imposes rigorous biochemical analysis and careful interpretation of the data, which should be improved in this work.

      We thank the reviewer for their praise and appreciation of our work. Indeed, FHOD3L is a challenging formin to work with.

      Important points are raised here and below regarding the brief elongation events we reported. As suggested, we performed more rigorous analysis of the data and present it in the revised manuscript. We now report that from 45 dim regions analyzed, in three independent experiments with wild type FHOD3L, we detected 40 bursts. (The remaining five could be formin falling off too quickly to detect or the dim spots could be regions of inhomogeneity in intensity, not due to formin.) For comparison to the presented data with FHOD3L-CT, we analyzed the filaments in TIRF assays with no formin present. As the reviewers point out, inhomogeneities in filament intensity are normal. Thus, we examined any dim spots for pauses and/or bursts. As is now reported in Figure 2G,H, the velocity of growth of these dim spots is indistinguishable from the velocity of the rest of the filament. We acknowledge that our numbers may not be perfectly accurate, due to the noise in our system, we believe that the difference of 3-4 fold increase versus no change in rate is substantial and convincing.

      We also determined the number of dim spots per length of filament. We found a higher frequency when FHOD3L-CT or FHOD3S-CT was present vs no formin, as now shown in Figure 2 – supplements 1G and 2E.

      We were asked about the pauses we observe before bursts of elongation and how we know they are functionally relevant. The short answer is that we do not know. We reported them because they were so common: Of the 40 bursts, pauses preceded the burst in 38 cases. We cannot rule out that this pause reflects an interaction with the surface but might expect the frequency to be lower if it were. We revise the text to make our conclusions about pauses more circumspect.

      We are convinced that the brief dim events we observed in the presence of FHOD3L-CT, in fact, reflect formin-mediated elongation and worked hard to improve their presentation, in addition to the added analysis. We include new kymographs, including examples from FHOD3L, FHOD3S, K1193L, and actin alone. We hope that the reviewers are also convinced.

      This does not preclude our interest in the microfluidics and two-color assays, which will be pursued in the future. We have reached out to a colleague who is set up to repeat these measurements with microfluidics-assisted TIRF. The noise should be greatly reduced and the system is also optimal for directly visualizing labeled FHOD3, as suggested. We expect these experimental approaches will provide additional insights.

      Reviewer #2 (Public review):

      This article elucidates the biochemical and cellular mechanisms by which the FHOD-family of formins, particularly FHOD3, contributes to sarcomere formation and contractility in cardiomyocytes. Formins are mainly known to nucleate and elongate actin filaments, with certain family members also exhibiting capping, severing, and bundling activities. Although FHOD3 has been well-established as essential for sarcomere assembly in cardiomyocytes, its precise biochemical functions and contributions to actin dynamics remain poorly understood.

      In this study, the authors combine in vitro biochemical assays with cellular experiments to dissect FHOD3's roles in actin assembly and sarcomere formation. They demonstrate that FHOD3 nucleates actin filaments and acts as a transient elongator, pausing elongation after an initial burst of filament growth. Using separation-of-function mutants, they show thatFHOD3's elongation activity - rather than its nucleation, capping, or bundling capabilities - is key for its sarcomeric function.

      The experiments have been conducted rigorously and well-analyzed, and the paper is clearly written. The data presented support the authors' conclusions. I appreciate the detailed description and rationale behind the FHOD3 constructs used in this study.

      We are happy to hear others find paper to be clearly written and well described.

      However, I was somewhat surprised and a bit disappointed that while the authors conducted single-color TIRF experiments to observe the effects of FHOD3 on single filaments, they did not use fluorescently labeled FHOD3 to directly visualize its behavior. Incorporating such experiments would significantly strengthen their conclusions regarding FHOD3's bursts of elongation interspersed with capping activity. While I understand this might require a few additional weeks of experiments, these data would add considerable value by directly testing the proposed mechanism.

      We appreciate the suggestion and hope to incorporate a two-color approach soon. As noted, FHOD3L is not always easy to work with and we do not have a functional labeled copy of the protein at this time.

      There is a typo in the word "required" in line number 30. The authors also use fit data to extract parameters in several panels (e.g., Figures 2b, 2d, 3a, and 3b). While these fit functions may be intuitive to actin experts, explicitly describing the fit functions in the figure legends or methods would greatly benefit the broader readership.

      Thank you for these comments. We updated the indicated figures and described the analysis in greater detail.

      Reviewer #3 (Public review):

      Valencia et al. aim to elucidate the biochemical and cellular mechanisms through which the human formin FHOD3 drives sarcomere assembly in cardiomyocytes. To do so, they combined rigorous in vitro biochemical assays with comprehensive in vivo characterizations, evaluating two wild-type FHOD3 isoforms and two function-separating mutants. Surprisingly, they found that both wild-type FHOD3 isoforms can nucleate new actin filaments, as well as elongate existing actin filaments in conjunction with profilin following barbed-end capping. This is in addition to FHOD3's proposed role as an actin bundler. Next, the authors asked whether FHOD3L promotes sarcomere assembly in cardiomyocytes through its activity in actin nucleation or rather elongation. With two function-separating mutants, the authors evaluated the numbers and morphology of sarcomeres, as well as their ability to beat and generate cardiac rhythm. The authors found that while the wild-type FHOD3L and the K1193L mutant can rescue sarcomere morphology and physiology, the GS-FH1 mutant fails to do so. Given that in GS-FH1 mainly elongation activity is compromised, the authors concluded that the elongation activity of FHOD3 is essential for its role in sarcomere assembly in cardiomyocytes, while its nucleator activity is dispensable. Overall, this important study provided a broadened view on the biochemical activities of FHOD3, and a pioneering view on a possible cellular mechanism of how FHOD3L drives sarcomere assembly. If further validated, this can lead to new mechanistic models of sarcomere assembly and potentially new therapeutic targets of cardiomyopathy.

      The conclusions of this paper are mostly well supported by the comprehensive biochemical analyses performed by the authors. However, the sarcomere assembly defect phenotype in the GS-FH1 rescue condition requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding by this construct in vivo, rather than its inability to drive elongation. Though the authors do show in Figure 6 that GS-FH1 can bind to normal-looking sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 may dimerize with endogenous FHOD3L. The authors should demonstrate that GS-FH1 alone can indeed interact with existing actin filaments in vivo. While this has been clearly demonstrated in vitro, given the more complex biochemical environment in vivo where additional unknown binding partners may present, cautions should be made when extrapolating findings from the former to the latter.

      The reviewer is concerned about the low protein levels in the GS-FH1 rescue experiments as reflected in the HA fluorescence intensity distributions shown in Fig. 5 Supplement 2A. While the scenario proposed could explain our observations with the GSFH1 rescues it is quite complex. Nor does the scenario preclude the conclusion that the FH1 domain is critical. We agree that the observed sarcomeres are likely to be residual in cells with incomplete RNAi. We now include the image of a cell that is still full of sarcomeres and note that the GH-FH1 is expressed at a relatively high level and striated throughout the cell. We interpret this as evidence that GS-FH1 is stable when suitable binding sites are available. We cannot exclude that there is more GS-FH1 because there was more endogenous FHOD3L with which to heterodimerize. If the GS-FH1 heterodimer were simply poisoning the wild type protein, we do not expect that it would be bound correctly to sarcomeres. If, instead, heterodimers have some activity, it seems far from sufficient to rescue sarcomere formation, suggesting that two functional FH1 domains are critical.

      Furthermore, we do not see evidence of correlation between protein levels and rescue at the level present in these cells (addressed below). Unfortunately, the proposed IP to test whether FHOD3L binds actin in vivo would only potentially report on filament side binding (both direct and indirect). It would not address whether the GS-FH1 mutant functions as a nucleator, elongator, bundler and/or capping protein in vivo.

      The critical question that we can address is whether the phenotype is due to low protein levels, assuming the protein present is functional, or due to loss of elongation activity by FHOD3L. To address this question, we returned to our data.

      First, we plotted the distributions of the intensities of the cells we analyzed further, in addition to the automated readout of all of the cells in the dish (Fig. 4 supplement 1). These cells were selected randomly and, as should be the case, the distributions of their intensities agree well with the original distributions for the three different rescue constructs: FHOD3L, K1193L, and GS-FH1 (Fig. 6 supplement 1). We then asked whether there was any correlation in HA intensities with the sarcomere metrics. As seen in our pilot data, no correlation is evident in any of the three cases across the range of intensities we collected (400 – 2700 a.u.) (old Fig. 6 supplement C,D,E). We now replace the data from pilot experiments with analysis of HA intensities and sarcomere metrics from the data sets included in the paper (new Fig 6. Supplement 1). Again, little to no correlation was observed (the single highest r-squared value is 0.2 and the remaining eight values are less than or equal to 0.08).

      To more specifically address the question of whether low HA fluorescence intensity is likely to reflect sufficient protein levels to build sarcomeres we re-examined two data sets from the FHOD3L WT rescue data. We found that, by chance, the first replicate of data from the wild type rescue has a comparable intensity distribution to that of the GSFH1 rescues (580 +/- 261 / cell vs. 548 +/- 105 / cell). In addition, we collected all of the data from cells with intensity levels <720, designed to mimic the distribution of the GS-FH1 cells (Fig. 6 supplement 3). We then compared the sarcomere metrics (sarcomere number, sarcomere length, sarcomere width) between the full data set and the two low intensity subsets:

      • Sarcomere number is the only non-normal metric. We therefore used the Mann Whitney U test, which shows no difference between all 3 WT distributions.

      • We compared Z-line lengths by one-way ANOVA and Tukey's post hoc tests, again finding no significant difference for all distributions.

      • Sarcomere length shows a weakly significant difference (p=0.038) between the whole WT data set and bio rep 1, but no difference between the whole WT data set and the HA<720 group.

      Thus, cells expressing wild type FHOD3L at levels comparable to levels detected in GS-FH1 mutant rescues, are fully rescued. Based on these findings we conclude that the expression levels in the GS-FH1 are high enough to rescue the FHOD3 knock down, supporting our conclusion that the defect is due to loss of elongation activity. We have added this analysis and discussion to the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      You will see that the 3 reviewers are very positive about your work and appreciate the elegant combination of biochemical assays and functional tests in cardiomyocytes. We've had a long discussion with them and we all agree that two experiments deserve further effort to make the conclusions of your paper more convincing.

      Thank you.

      The first experiment is the TIRF elongation assay, where the two biochemist Reviewers remain doubtful that these short events are really due to the presence of a formin at the end of the filament. One of them suggests that two-color imaging with a labeled formin should clearly prove this point.

      We agree that the elongation assays can be improved. Given the similarity of processivity of Fhod3L, Fhod3S and Drosophila FhodA (measured by a distinct method), we are inclined to believe them. However, the reviewer raises an excellent point about the accuracy of the measurements given the resolution (and noise) of the data. We are interested in the two-color imaging assay but do not believe it will necessarily simplify the analysis. We suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation rates. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low.

      We will return to these experiments, using alternate methods, curious to see what else we learn. In the meantime, we conducted more thorough analysis, including controls, and improved visualization of example traces. Data for elongation analysis and kymographs were acquired with Jfilament. We stretched the x-axis (time) in kymographs for FHOD3L-CT (Fig. 2F), FHOD3S-CT (Fig. 2, supplement 2C), FHOD3L-CT K1193L (Fig. 3, supplement 1A), and actin alone (Fig 2G), and highlighted regions of analysis. The slopes for these regions, separated based on intensity, were fit to the data in KaleidaGraph. The fits are offset from the data such that they do not obscure the filaments and corresponding rates are given. The fact that we never see fast dim regions when FHOD3 is not present, as shown in Fig. 2H and that the frequency of dim events is markedly increased (Fig. 2-supplements 1G and 2E) give us confidence that the events are real. We acknowledge in the text that the precise values of the short events may be inaccurate due to the resolution of our experiments. We hope the reviewers are convinced by the improved analysis.

      The second experiment is the sarcomere assembly defect phenotype in the GS-FH1 rescue condition. This requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding/nucleation in vivo, rather than its inability to elongate F-actin. Although you show that GS-FH1 can bind to sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 could dimerize with endogenous FHOD3L.

      We agree that the sarcomeres we see are likely to be residual and could reflect some remaining endogenous FHOD3. The reviewers are concerned about the low protein levels in the GSFH1 rescues. First, we do not agree that the levels are “extremely” low. Through careful analysis, we established that 3xHA-FHOD3L intensities between 300 and 3000 a.u./um<sup>2</sup> were sufficient for full rescue. The mean for the GSFH1 experiments is 533 +/- 93, which is well within this range. Furthermore, we did not observe correlation between sarcomere number, length, or width and HA intensity over the full range collected for wild type FHOD3L or within the GS-FH1 data. We previously showed pilot data but now show correlation analysis for every analyzed cell (Fig. 4 – figure supplement 1 D-F). We conducted this analysis on all of the mutant rescue experiments (Fig. 6-supplement 1). Finally, we identified two subpopulations of the wildtype rescue data. One is all of the cells with HA intensity < 720, which gives a distribution of mean 545 +/- 85. The second set is the first biological replicate of wild type rescue, which has a distribution of mean 560 +/- 160. Again correlation shows little relationship between HA levels and sarcomere metrics. Nevertheless, we show intensity level matched images in Fig 6, as opposed to images reflecting average intensities.

      The critical question remains whether the phenotype is due to low protein levels or due to loss of elongation by FHOD3L. Notably, we now show a cell that is full of sarcomeres and has relatively high FHOD3L levels as well, consistent with available binding sites stabilizing mutant protein but not ruling out heterodimerization (Fig. 6 – figure supplement 2C). Others have expressed mutant FHOD3L in a wild type background in mice. They observed poisoning, consistent with heterodimerization. Thus, it is possible that, as suggested, the FHOD3L-GSFH1 detected in sarcomeres is in fact heterodimerized with residual endogenous FHOD3L. In this case, we would still conclude that the protein is not functional enough to rescue, supporting a role for the FH1 domain.

      In the future, we plan to perform experiments with compromised, but not inactive, FH1 domains, as we discuss in the paper.

      We hope that you will find these comments useful.

      Yes, the comments were thoughtful and helped us write a better paper. Thank you.

      Reviewer #1 (Recommendations for the authors):

      Some experiments should be described and analyzed more carefully. This lack of clarity calls into question the interpretation of some experiments. Overall, this study is not yet as convincing as it should be.

      Main recommendations:

      (1) Formin elongation phases in the TIRF experiment are not convincing. They are rare and it is difficult to see any significant difference between the control movie without FHOD3L-CT and the movie with FHOD3L-CT. Filaments assembled in the absence of FHOD3L-CT also show some fluorescence inhomogeneity (which is normal), and measurements of formin elongation rates and capping times are not convincing (for example, the kymograph of the control profilin-actin situation in Figure 2F also shows a fast elongation phase on the right).

      Please see response above. We conducted more thorough analysis and created improved visualizations. We hope the data are more convincing now.

      It is also difficult to understand how an accurate measurement can be made from these noisy kymographs, and the method section should explain that precisely.

      This is a valid point. We added details of analysis to the methods section and we discuss the fact that the measurements are at the limit of our resolution in the paper. We rely on the large (~3-fold) difference in elongation, more than specific elongation rates for our interpretation.

      One of the problems is that these events are too transient to quantify well with noisy data. I noticed that the formin concentration used in these movies is quite low (0.1 nM FHOD3L-CT). Is there a reason for this? Is it possible to increase the formin concentration to increase the number of formin capping/elongation events and provide more convincing movies?

      We acknowledge that the data are noisy. We felt that it was necessary to perform experiments with filaments only tethered at one end, leaving the growing end free. We did so, in part, because when we did experiments with biotinylated actin to anchor the filaments down, we observed pauses in the absence of formin. Ultimately, we compromised, using anchored seeds and a relatively low concentration of NEM-myosin to decrease motion of the actin filaments.

      The experiments were performed with such low FHOD3L-CT because it was a potent nucleator in TIRF assays, making data analysis nearly impossible with more formin present. FHOD3S-CT and FHOD3L-CT K1193L behaved somewhat differently between these experiments and we were able to perform them with 1 nM formin.

      Not seeing formin at the tip of the filaments is an additional difficulty because we do not know if these pauses occur because formin is stuck to the coverslips (which could very well happen with these sticky proteins) or freely bound at the end of a filament as the text suggests. Is there any argument in favor of one scenario over the other?

      This will be an important experiment. As described above, we suspect that Fhod spends more time at/near the barbed end than is apparent based on elongation data. The fact that we see repeated events on individual filaments at such low concentrations of FHOD3L (0.1 nM) supports this idea. Otherwise, the likelihood of FHOD3L finding barbed ends so often is really quite low. In order to address the question about the cause of pauses, we reviewed our data, finding that 38 of 40 bursts were preceded by pauses. We do, however, discuss that we cannot rule out non-specific interactions with the surface.

      (2) Pyrene elongation assays in the presence of profilin are actually more convincing to test the elongation ability of formins. However, such an assay is not presented for all mutants. It should be.

      While we agree to some extent with this comment, we did not include the pyrene data for all of the mutants because the shapes of the curves were even more complicated than those seen with wild type FHOD3L-CT rendering them uninterpretable.

      (3) Some experiments (e.g. in Figure 2E) are performed with yeast profilin, while others (e.g. in Figure 2F) are performed with human profilin. Obviously, both profilins could modulate formin activity differently and the side-by-side interpretation of both experiments is difficult. Could the authors stick to human profilin for all experiments?

      We used to always perform pyrene assays with yeast profilin because it was known to be insensitive to pyrene. These data were collected before we realized that the affinity of human profilin for actin is so high that we could probably do everything with this profilin. We have compared the two profilins for other formins, e.g. Delphilin, Capu, and did not observe detectable differences.

      Minor recommendations:

      (1) The pyrene assays with the light blue colored curve choice are not ideal. I have difficulties seeing some of the curves.

      Thank you. We added symbols to a subset of the traces to make them more visible.

      (2) In the same curves, I can't understand what the +3.75 and 0.078 numbers mean. Could these results be plotted in a clearer way?

      These values are the lowest concentrations in the range tests. They were matching light blue with black outline for visibility. We added symbols and changed the color of the numbering for improved visibility/understanding.

      (3) In Figure 2D, is the Kd of I1163A really determined only from 2 experimental data points?

      Of course not. We now show the figure with extended axes in Fig. 2 - figure supplement 1C.

      (4) In Figure 2C, the shape of the curves suggests that this is not a pure capping assay, but a mix of capping and nucleation. It's not dramatic but could lead to an under-estimation of the capping efficiency.

      We agree with the reviewer that the complicated shapes confound interpretation. Our analysis is based on the earliest slopes, in part, for this reason. We added discussion of this complication to the text.

      Reviewer #3 (Recommendations for the authors):

      Suggestions for additional experiments:

      (1) To evaluate whether GS-FH1 alone can indeed interact with existing actin filaments in vivo, the authors may consider performing immunoprecipitation assays with GS-FH1 extracted from rescued NRVMs.

      An IP of GS-FH1 from cells could show actin filament side binding but, unfortunately, will not provide any information about filament end binding, which is of much greater interest.

      It will be helpful to show phalloidin staining in GS-FH1 rescues in a similar manner as in Figure 6-supplement 1, panel B, and compare that with mock rescue in Figure 4 panel D. It will be essential to prove this prior to concluding that actin elongation activity is essential for sarcomere assembly.

      This is an excellent suggestion. We now include images of phalloidin stained cells from both K1193L and GS-FH1 rescues (Fig. 6A’ – supplement 2A,B). We were intrigued to see small actin punctae that were sometimes aligned. We speculate that these could be pre-premyofibrils and suggest that this is further evidence that the GS-FH1 protein is not completely unstable.

      (2) Prior to sarcomere assembly, a-actinin is known to form short bundles with actin filaments (I-Z-I complex) without clearly defined periodicity. This semi-ordered state then transforms into the more ordered sarcomeres with periodic spacing. It will be valuable to show the phalloidin staining in addition to the a-actinin IF consistently across all conditions. This may lead to further insights into the defects of sarcomere assembly. Along the same vein, higher magnification images showcasing several sarcomeres will help the readers evaluate these defects.

      We agree that there are additional valuable measurements to be made. In order to favor synchronized contraction, we plated the cells at too high a density to reliably identify IZI complexes. We have included some zoomed in images of the phalloidin staining.

      Recommendations for improving the writing:

      The authors mentioned the interaction between cardiac MyBP-C and FHOD3L as essential for the localization of FHOD3L to the C-line of the sarcomere. Can they discuss whether this interaction is important for the role of FHOD3L in sarcomere assembly? If so, how?

      This is a very interesting question that we cannot answer at this time.

      Minor corrections to the text and figures:

      In the legend of Figure 2-Figure Supplement 1, the labels of (F) and (E) are swapped.

      Thank you for catching this.

    1. Author response:

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the reviewers and editors for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to support accessibility and reproducibility. We respectfully disagree, however, with the assessment that the system lacks live-cell imaging capabilities. We are fully confident in the system’s suitability for live-cell applications and will demonstrate this by including representative live-cell imaging data in the revised manuscript, along with detailed instructions for implementing environment control. Moreover, we will expand our discussion to include a broader, more quantitative comparison to existing LSFM platforms—highlighting trade-offs in cost, performance, and accessibility—to better contextualize Altair’s utility and adaptability across diverse research settings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths:

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. For this reason, we are especially interested in identifying and expanding any sections that are perceived as superficial, and we would greatly appreciate the reviewer’s guidance on which areas would benefit from further elaboration.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to reduce the barrier to entry for non-specialist labs.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We agree that there are practical challenges associated with handling 5 mm diameter coverslips. However, the Nikon 25x can readily be replaced by a Zeiss W Plan-Apochromat 20x/1.0 objective, which eliminates the need for the 5 mm coverslip[1]. In the revised manuscript, we will more explicitly detail the practical challenges in handling a 5 mm coverslip and mention the alternative detection objective.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We understand the reviewer’s concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. While lower-cost alternatives for analog and digital control (e.g., microcontroller-based systems) do exist, our choice was intentional. By relying on a unified and professionally supported platform, we minimize the complexity of sourcing, configuring, and integrating components from disparate vendors—each of which would otherwise demand specialized technical expertise. Moreover, in future releases, we aim to further streamline the system by eliminating the need for the NI card, consolidating all optoelectronic control through the ASI Tiger Controller. This approach allows users to purchase a fully assembled and pre-configured system that can be operational with minimal effort.

      It is worth noting that the ASI components are not the primary cost driver. The full set—including XYZ and focusing stages, a filter wheel, a tube lens, the Tiger Controller, and basic optomechanical adapters—costs approximately $27,000, or ~18% of the total system cost. Additional cost reductions are possible. For example, replacing the motorized sample positioning and focusing stages with manual alternatives could reduce the cost by ~$12,000. However, this would eliminate key functionality such as autofocusing, 3D tiling, and multi-position acquisition. Open-source mechanical platforms such as OpenFlexure could in principle be adapted, but they would require custom assembly and would need to be integrated into our control software. Similarly, the filter wheel could be omitted in favor of a multi-band emission filter, reducing the cost by ~$5,000. However, this comes at the expense of increased spectral crosstalk, often necessitating spectral unmixing. An industrial CMOS camera—such as the Ximea MU196CR-ON, recently demonstrated in a Direct View Oblique Plane Microscopy configuration[2]—could substitute for the sCMOS cameras typically used in high-end imaging. However, these industrial sensors often exhibit higher noise floors and lower dynamic range, limiting sensitivity for low-signal imaging applications.

      While a $150,000 system represents a significant investment, we consider it relatively cost-effective in the context of advanced light-sheet microscopy. For comparison, commercially available systems with similar optical performance—such as LLSM systems from 3i or Zeiss—are several-fold more expensive.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our fibroblast images. As noted, the current manuscript focuses on the optical design and performance characterization of the system, using fixed specimens to validate resolution and imaging stability. We acknowledge the importance of environmental control for live-cell imaging. Temperature regulation is routinely implemented in our lab using flexible adhesive heating elements paired with a power supply and PID controller. For pH stabilization in systems that lack a 5% CO<sub>2</sub> atmosphere, we typically supplement the imaging medium with 10–25 mM HEPES buffer. In the revised manuscript, we will introduce a modified sample chamber capable of maintaining user-specified temperatures, along with detailed assembly instructions. We will also include representative live-cell imaging data to demonstrate the feasibility of in vitro imaging using this system.

      Reviewer #2 (Public review):

      Summary:

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems.

      Strengths:

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful summary of our work. We are pleased that the foundational optical principles, design rationale, and emphasis on accessibility came through clearly. We agree that the approach used to construct the microscope is highly modular, and we anticipate that these design principles will serve as the basis for additional system variants tailored to specific biological samples and experimental contexts. To support this, we provide all Zemax simulations and CAD files openly on our GitHub repository, enabling advanced users to build upon our design and create new functional variants of the Altair system.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      The referee is correct that our comparisons reference the original LLSM design, which was simultaneously disseminated as an open-source platform and commercialized by 3i. While we acknowledge that newer variants of LLSM have been developed—including systems incorporating adaptive optics[3] and the MOSAIC platform (which remains unpublished)—the original implementation remains the most widely described and cited in the literature. It is therefore the most appropriate point of comparison for contextualizing Altair’s performance, complexity, and accessibility. Importantly, this version of LLSM is far from obsolete; it continues to be one of the most commonly used imaging systems at Janelia Research Campus’s Advanced Imaging Center.

      We acknowledge that more recent commercial implementation by Zeiss has addressed several of the practical limitations associated with the original design. In particular, we agree that the Zeiss Lattice Lightsheet 7 system, which integrates a meniscus lens to facilitate oblique imaging through a coverslip, offers a user-friendly experience—albeit with a modest tradeoff in resolution (reported deskewed resolution: 330 nm × 330 nm × 500–1000 nm).

      While we recognize that statements on usability and stability can be subjective, one objective proxy for system complexity is the number of optical elements that require precise alignment during assembly. The original LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair system contains only 9 such elements. By this metric, Altair is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories. In the revised manuscript, we will clarify the scope of our comparison and provide more precise language about what we mean by complexity (e.g., number of optical elements needed to align).

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may serve as an inconvenience for many users. We will discuss this more explicitly in the revised manuscript. Specifically, we note that changing the detection objective is sufficient to eliminate the need for a 5 mm coverslip. For example, as demonstrated in Moore et al., Lab Chip 2021, pairing the Zeiss W Plan-Apochromat 20x/1.0 objective with the Thorlabs TL20X-MPL allows imaging beyond the physical surfaces of both objectives, removing the constraint imposed by small-format coverslips[1]. In the revised manuscript, we will propose this modification as a straightforward path for increasing compatibility with more conventional sample mounting formats.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We appreciate the reviewer’s emphasis on the importance of environmental control for live-cell imaging applications. It is worth noting that the original LLSM design, including the system commercialized by 3i, provided temperature control only, without integrated gas or humidity regulation. Despite this, it has been successfully used by a wide range of scientists to generate important biological insights.

      We agree that both OPM and the Zeiss implementation of LLSM offer clear advantages in terms of environmental control, as we previously discussed in detail in Sapoznik et al., eLife, 2020[4]. However, assembly of high numerical aperture OPM systems is highly technical, and no open-source variant of OPM delivers sub-cellular scale resolution yet.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We thank the reviewer for this comment. It is true that our discussion focused primarily on the square lattice implementation of LLSM. While this could be viewed as a subset of the system’s broader capabilities, we chose this focus intentionally, as the square lattice remains by far the most commonly used variant in practice. Even in the original LLSM publication, 16 out of 20 figure subpanels utilized the square lattice, with only one panel each representing the hexagonal lattice in SIM mode, a standard Bessel beam in incoherent SIM mode, a hex lattice in dithered mode, and a single Bessel in dithered mode. This usage pattern largely reflects the operational simplicity of the square lattice: it minimizes sidelobe growth and enables more straightforward alignment and data processing compared to hexagonal or structured illumination modes.

      In 2019, we performed an exhaustive accounting of published illumination modes in LLSM and found that the SIM mode had only been used in two additional peer-reviewed publications at that time. We will consider updating this table in the revised manuscript and will expand our discussion to acknowledge the broader flexibility of the LLSM platform—including its capacity for structured illumination and alternative light-sheet geometries. However, we will also emphasize that, despite these advanced capabilities, the square lattice remains the dominant mode used by the community and therefore serves as a fair and practical benchmark for comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we will include a demonstration of live-cell imaging to directly validate the system’s suitability for dynamic biological applications. We will also characterize the temporal resolution of the system. As a sample-scanning microscope, the imaging speed is primarily limited by the performance of the Z-piezo stage. For simplicity and reduced optoelectronic complexity, we currently power the piezo through the ASI Tiger Controller. We will expand the supplementary material to describe the design criteria behind this choice, including potential trade-offs, and provide data quantifying the achievable volume rates under typical operating conditions.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a baseline familiarity with optics and instrumentation. Our goal is not to eliminate this requirement entirely, but to significantly reduce the technical and logistical barriers that typically accompany custom light-sheet microscope construction.

      Importantly, no machining experience or in-house fabrication capabilities are required—users can simply submit provided design files and specifications directly to the vendor. We will make this process as straightforward as possible by supplying detailed instructions, recommended materials, and vendor-ready files. Additionally, we draw encouragement from the success of related efforts such as mesoSPIM, which has seen over 30 successful implementations worldwide using a similar model of exhaustive online documentation, open-source control software, and community support through user meetings and workshops.

      We recognize that documentation alone is not always sufficient, and we are committed to further lowering barriers to adoption. To this end, we are actively working with commercial vendors to streamline procurement and reduce the logistical burden on end users. Additionally, Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant, which provides dedicated resources for hosting workshops, offering real-time community support, and generating supplementary materials such as narrated video tutorials. We will expand our discussion in the revised manuscript to better acknowledge these implementation challenges and outline our ongoing strategies for supporting a broad and diverse user base.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We appreciate the reviewer’s comment and agree that our language regarding adaptability may have been too strong. It was not our intention to suggest that the system can be easily modified without prior experience. Meaningful adaptations of the optical or mechanical design would require users to have expertise in optical layout, optomechanical design, and alignment.

      That said, for labs with sufficient expertise, we aim to facilitate such modifications by providing comprehensive resources—including detailed Zemax simulations, CAD models, and alignment documentation. These materials are intended to reduce the development burden for those seeking to customize the platform for specific experimental needs.

      In the revised manuscript, we will clarify this point and explicitly state in the discussion what technical expertise is required to modify the system. We will also revise our language around adaptability to better reflect the intended audience and realistic scope of customization.

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging.

      The system is designed for ease of assembly and use, incorporating a custom-machined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells.

      The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy.

      Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity. At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      Weaknesses:

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signal-to-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we will expand our discussion to include a broader range of light-sheet microscope designs and imaging modes, including both single- and dual-objective configurations. We agree that highlighting the trade-offs between these approaches—such as working distance, sample geometry constraints, and alignment complexity—will enhance the overall context and utility of the manuscript.

      To further aid comparison, we will include a summary table referencing key image quality parameters such as lateral and axial resolution, and illumination beam NA for Altair-LSFM. Where available, we will reference values from published work—such as the axial resolution reported in Valm et al. (Nature, 2017)—to provide a clearer benchmark. Because such comparisons can be technically nuanced, especially when comparing across systems with different geometries and sample mounting constraints, we will also include a supplementary note outlining the assumptions and limitations of these comparisons.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We thank the reviewer for highlighting this important consideration. In the revised manuscript, we will provide a detailed description of how temperature control can be implemented using flexible adhesive heating elements, a power supply, and a PID controller. Step-by-step assembly instructions and recommended components will be included to facilitate adoption by users interested in live-cell imaging. We also note that most light-sheet microscopy systems capable of sub-cellular resolution—including the original LLSM design, diSPIM, and ASLM—typically do not incorporate integrated CO<sub>2</sub> or humidity control. These systems often rely on HEPES-buffered media to maintain pH stability, which is generally sufficient for short- to intermediate-term imaging. While full environmental control may be necessary for extended time-lapse studies, it is not a prerequisite for high-resolution volumetric imaging in many applications. Nonetheless, we will include a discussion of the challenges associated with adding CO<sub>2</sub> and humidity control to open or semi-enclosed architectures like Altair-LSFM, and outline potential future paths for integration with commercial incubation systems.

      (3) System cost and data storage cost: While the system presented has the advantage of being open-source, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We thank the reviewer for raising this important point. First, we would like to clarify that the quoted $150k cost estimate includes the optical table and laser source. We apologize for any confusion and will communicate this more effectively in the revised manuscript.

      We agree that adaptability is a key concern, especially in academic settings with limited budgets. The detection path can be readily altered depending on experimental needs and cost constraints. For example, in our discussion of alternatives to the 5 mm coverslip geometry, we will describe how switching to a Zeiss W Plan-Apochromat 20x/1.0 in combination with a compatible excitation objective allows high-resolution imaging while accommodating more conventional sample formats. We will expand this to include cost-effective alternatives as well.

      We will also expand our discussion on cost-reduction strategies and the associated trade-offs. These include replacing motorized stages with manual ones, omitting the filter wheel in favor of a multi-band emission filter, or using industrial-grade cameras in place of scientific CMOS detectors. While each change entails some loss in functionality or sensitivity, such modifications allow users to tailor the system to their specific budget and application.

      Finally, we recognize the challenge in communicating exact costs of commercial systems due to variability in configuration and pricing. Nonetheless, we will include approximate figures where possible and note that comparable commercial systems—such as LLSM platforms from 3i and Zeiss—are several-fold more expensive than the system presented here.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      Data storage is indeed a critical consideration in light-sheet microscopy. In the revised manuscript, we will provide a note outlining typical volume dimensions for live-cell imaging experiments along with the associated data overhead. This will include estimates for voxel counts, bit depth, time-lapse acquisitions, and multi-channel datasets to help users anticipate storage needs. We will also briefly discuss strategies for managing large datasets, file types and compression formats.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community.

      References

      (1) Moore, R. P. et al. A multi-functional microfluidic device compatible with widefield and light sheet microscopy. Lab Chip 22, 136-147 (2021). https://doi.org/10.1039/d1lc00600b

      (2) Lamb, J. R., Mestre, M. C., Lancaster, M. & Manton, J. D. Direct-view oblique plane microscopy. Optica 12, 469-472 (2025). https://doi.org/10.1364/OPTICA.558420

      (3) Liu, T. L. et al. Observing the cell in its native state: Imaging subcellular dynamics in multicellular organisms. Science 360 (2018). https://doi.org/10.1126/science.aaq1392

      (4) Sapoznik, E. et al. A versatile oblique plane microscope for large-scale and high-resolution imaging of subcellular dynamics. eLife 9 (2020). https://doi.org/10.7554/eLife.57681

      (5) Huisken, J. & Stainier, D. Y. Even fluorescence excitation by multidirectional selective plane illumination microscopy (mSPIM). Opt Lett 32, 2608-2610 (2007). https://doi.org/10.1364/ol.32.002608

      (6) Ricci, P. et al. Removing striping artifacts in light-sheet fluorescence microscopy: a review. Prog Biophys Mol Biol 168, 52-65 (2022). https://doi.org/10.1016/j.pbiomolbio.2021.07.003

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mollá-Albaladejo et al. investigate the neurons downstream of GR64f and Gr66a, called G2Ns. They identify downstream neurons using trans-Tango labeling with RFP and then perform bulk RNA-seq on the RFP-sorted cells. Gene expression is up- or downregulated between the cell populations and between fed and starved states. They specifically identify Leukocinin as a neuropeptide that is upregulated in starved Gr66a cells. Leucokinin cells, identified by a GAL4 line indeed show higher expression when starved, especially in the SEZ. Furthermore, Leucokinin cells colocalize with the transTango signal from downstream neurons of both GRs. This connection is confirmed with GRASP. According to EM data, Leucokinin cells in the SEZ receive a lot of input and connect to many downstream neurons. In behavior experiments performed with flies lacking Leucokinin neurons, flies show reduced responsiveness to sugar and bitter mixtures when starved. The authors suggest that Leucokinin neurons integrate bitter and sugar tastes and that their output is modified by a hunger state.

      Strengths:

      The authors use a multitude of tools to identify SELK neurons downstream of taste sensory neurons and as starvation-sensitive cells. This study provides an example of how combining genetic labeling, RNA-seq, and EM analysis can be combined to investigate neural circuits.

      Weaknesses:

      The authors do not show a functional connection between sensory neurons and SELK neurons. Additionally, data from RNA seq, anatomical studies, and EM analysis are sometimes contradictory in terms of connectivity. GRASP signal is not foolproof that cells are synaptically connected.

      We appreciate the reviewer’s comments. Unfortunately, we have not successfully demonstrated a functional response of SELK neurons using in vivo calcium imaging with UAS-GCaMP7 (we tried f, m, and s versions), primarily due to challenges in obtaining stable signals. We stimulated GRNs using sucrose, caffeine, or a mixture of both, and maybe even if the concentrations were high, they were not enough to induce a response.

      Regarding GRASP, we acknowledge its limitations as a standalone technique for establishing genuine synaptic connections between neurons, as some signals may reflect false positives resulting from the mere proximity of the candidate neurons. To strengthen our findings, we complemented these results by demonstrating the positive colocalization of the Leucokinin antibody signal over the Gr66aGal4>trans-TANGO and Gr64f-Gal4>trans-TANGO (Figure 4), confirming that Leucokinin neurons are indeed postsynaptic to both sweet and bitter GRNs. Moreover, we incorporated BacTrace data to highlight the direct connectivity between sweet and bitter GRNs (now Figure 5E).

      In the revised manuscript, we have introduced the active-GRASP technique (Macpherson et al., 2015). In this version of GRASP, the presynaptic half of GFP (GFP 1-10) is fused to synaptobrevin, which becomes accessible in the membrane of the presynaptic neuron within the synaptic cleft upon presynaptic stimulation (in our case, by stimulating with sucrose sweet Gr64f<sup>GRNs</sup> and with caffeine the bitter Gr66a<sup>GRNs</sup>). Utilizing this technique, we successfully demonstrated (see new Figure 5B and 5D) that when presented with water, no signal was detected in the Gr66a-LexA, Lk-Gal4 > active-GRASP, or Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies. However, in the presence of caffeine, Gr66aLexA, Lk-Gal4 > active-GRASP transgene flies exhibited a clear signal in the SEZ, and similarly, sucrose presentation to Gr64f-LexA, Lk-Gal4 > active-GRASP transgene flies yielded a detectable signal. The results obtained from active-GRASP provide additional evidence supporting the connectivity between SELK neurons and both Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup>, further indicating the functional connectivity of the GRNs and SELK neurons.

      The authors describe a behavioral phenotype when flies are starved, however, they do not use a specific driver for the described cell type, thus they should also tone down their claims.

      We agree with the reviewer that the Lk-Gal4 driver line used labels SELK, LHLK, and ABLK neurons. The behavior examined in this paper, the Proboscis Extension Response (PER), measures the initiation of feeding. Although the neural circuit involved in this behavior is primarily confined to the SEZ where SELK neurons are located, we cannot rule out the possibility that other Lk neurons may also play a role in the process. To restrict expression of the Tetanus Toxin, we have utilized the tsh-Gal80 (Clyne et al., 2008) transgene in combination with the Lk-Gal4>UAS-TNT and Lk-Gal4>UAS-TNT<sup>imp</sup> constructs to prevent the expression of the Tetanus Toxin in ABLK neurons, thereby restricting its expression to the SELK and LHLK neurons in the central brain. The new results (Sup Figure 7A) indicate that ABLK neurons do not play a role in integrating sweet and bitter information. However, we acknowledge the reviewer's point that we are still silencing LHLK neurons, so we have adjusted our claims to align more closely with our data

      Generally, the authors do not provide a big advancement to the field and some of the results are contradictory with previous publications.

      We believe our work does not contradict previous findings, nor does it invalidate the role of ABLK neurons in water homeostasis or the role of LHLK neurons in regulating sleep via starvation. We provide additional information on the possible role of SELK neurons in integrating gustatory information. The location of SELK neurons in the SEZ suggests that they may play a role in feeding behavior, and we have demonstrated that these neurons are indeed involved in integrating gustatory information to influence feeding decisions. We consider we have contributed by highlighting a new role for the Leucokinin neuropeptide in feeding behavior.

      Reviewer #2 (Public review):

      Summary:

      A core task of the brain is processing sensory cues from the environment. The neural mechanisms of how sensory information is transmitted from peripheral sense organs to subsequent being processing in defined brain centers remain an important topic in neuroscience. The taste system hereby assesses the palatability of food by evaluating the chemical composition and nutrient content while integrating the current need for energy by assessing the satiation level of the organism. The current manuscript provides insights into the early circuits of gustatory coding using the fruit fly as a model. By combining trans-tango and FACS- based bulk RNAseq to assess the target neurons of sweet sensing (using Gr64fGal4) and bitter sensing (using Gr66a-Gal4) in a first set of experiments the authors investigate genes that are differentially expressed or co-expressed in normal and starved conditions. With a focus on neuropeptides and neurotransmitters, different expressions in the different conditions were assessed resulting in the identification of Leucokinin as a potentially interesting gene. The notion is further supported by RNAseq of Lk- Gal4>mCD8:GFP sorted cells and immunostainings. GRASP and BacTrace experiments further support that the two Lk- expressing cells in the SEZ should indeed be postsynaptic to both types of sensories. Using EM-based connectomics data (based on a previous publication by Engert et al.), the authors also look for downstream targets of the bitter versus sweet gustatory neurons to identify the Lk-neurons. Based on the morphology they identify candidates and further depict the potential downstream neurons in the connectome, which appears largely in agreement with GRASP experiments. Finally silencing the Lk- neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding neurons shows an increased PER response in starved flies (when combined with bitter compounds) as well as increased feeding in a FlyPad assay. Strengths:

      Overall this is an intriguing manuscript, which provides insight into the organization of 2nd order gustatory neurons. It specifically provides strong evidence for the Lk-neurons as a target of sweet and bitter GRNs and provides evidence for their role in regulating sweet vs bitter-based behavioral responses. Particularly the integration of different techniques and datasets in an elegant fashion is a strong side of the manuscript. Moreover to put the known LK-neurons into the context of 2nd order gustatory signalling is strengthening the knowledge about this pathway.

      Weaknesses:

      I do not see any major weakness in the current manuscript. Novelty is to some degree lessened by the fact, that the RNAseq approach did not identify new neurons but rather put the known LK-neurons as major findings. Similarly, the final behavioral section is not very deep and to some degree corroborates the previous publication by the Keene and Nässel labs - that said, the model they propose is indeed novel (but lacks depth in analyses; e.g. there is no physiology that would support the modulation of Lk neurons by either type of GRN). The connectomic section appears a bit out of place and after reading it it's not really clear what one should make of the potential downstream neurons (particularly since the Lk-receptor expression has been previously analyzed); here it might have been interesting to address if/how Lk-neurons may signal directly via a classical neurotransmitter (an information that might be found easily in the adult brain single-cell data).

      We thank the reviewer for the comment. Indeed, we attempted in vivo Ca imaging but were unsuccessful. We have rewritten the connectomic section to better integrate it with the rest of the text and have reanalyzed the data obtained. We considered gathering data from the single-cell adult dataset, but this dataset includes the entire adult fly brain, encompassing SELK and LHLK neurons, making it impossible to differentiate between the two types of Lk neurons. Any further analysis will require transcriptomic analysis of SELK via scRNAseq under the different metabolic conditions tested in this study work.

      Reviewer #3 (Public review):

      Summary:

      To make feeding decisions, animals need to process three types of information: positive cues like sweetness, negative cues like bitterness, and internal states such as hunger or satiety. This study aims to identify where the information is integrated into the fruit fly brain. The authors applied RNA sequencing on second-order gustatory neurons responsible for sweet and bitter processing, under fed and starved conditions. The sequencing data reveal significant changes in gene expression across sweet vs. bitter pathways and fed vs. starved states. The authors focus on the neuropeptide Leucokinin (Lk), whose expression is dependent on the starvation state. They identify a pair of neurons, named SELK neurons, which express Lk and receive direct input from both sweet and bitter gustatory neurons. These SELK neurons are ideal candidates to integrate gustatory and internal state information. Behavioral experiments show that blocking these neurons in starved flies alters their tolerance to bitter substances during feeding.

      Strengths:

      (1) The study employs a well-designed approach, targeting specific neuronal populations, which is more efficient and precise compared to traditional large-scale genetic screening methods.

      (2) The RNAseq results provide valuable data that can be utilized in future studies to explore other molecules beyond Lk.

      (3) The identification of SELK neurons offers a promising avenue for future research into how these neurons integrate conflicting gustatory signals and internal state information.

      Weaknesses:

      (1) Unfortunately, due to technical challenges, the authors were unable to directly image the functional activity of SELK neurons.

      (2) In the behavioral experiments, tetanus toxin was used to block SELK neurons. Since these neurons may release multiple neurotransmitters or neuropeptides, the results do not specifically demonstrate that Leucokinin (Lk) is the critical factor, as suggested in Figure 8. To address this, I recommend using RNAi to inhibit Lk expression in SELK neurons and comparing the outcomes to wild-type controls via the PER assay.

      We appreciate the author's comments and suggestions. As noted, Tetanus Toxin silences the neuron’s activity, affecting the functioning of various neurotransmitters and neuropeptides released by the targeted neuron. In response to the reviewer's recommendation, we employed an RNAi line specifically designed to silence Leucokinin production in Lk-expressing neurons.

      The results presented in Supplementary Figure 7B demonstrate that knocking down Leucokinin in Lk neurons significantly reduces the flies' tolerance to caffeine in sweet food.

      It is crucial to highlight that the sucrose concentration used in Figure 7C was 50mM, whereas in Supplementary Figure 7B, it was increased to 100mM. This adjustment was necessary because the Lk-Gal4, UAS-RNAi, and Lk-Gal4>UAS-RNAi transgenic lines exhibited reduced sensitivity to sucrose compared to the Lk-Gal4>UAS-TNT or Lk-Gal4>UAS-TNT<sup>imp</sup> lines. We aimed to establish a sucrose concentration that would elicit a 50% Proboscis Extension Response (PER) without adding any other compound, thereby allowing us to evaluate the additional effect of caffeine in the food.

      However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To get more evidence for connections between sensory cells and SELK neurons, could the authors also analyze a second available EM data set? Would setting a different threshold (>5 synapses) reveal connections to both sensories? Comparisons between SELK in- and outputs from EM data and Tango labeling also seem to differ quite a lot based on provided images - can the authors count cell bodies in the stainings? Further proof would be to provide functional imaging data that shows that SELK neurons respond to sugar and bitter compounds.

      In this study, we utilized the recently published EM dataset for the Drosophila central brain connectome (Dorkenwald et al., 2024; Flywire.ai). Changing the number of synapses affects the counts of pre- and postsynaptic neurons. We set a threshold of more than five synapses, as recommended by Flywire, to avoid false positives (Dorkenwald et al., 2024). This threshold has been widely used in recent papers (Engert et al., 2022; Shiu et al., 2022; Walker et al., 2025).

      The neuron counts in the connectomic data differ from those in the trans- and retro-TANGO experiments. In our initial trans-TANGO experiment, which labeled postsynaptic neurons in the Gr64fGal4 and Gr66a-Gal4 transgenic lines, we counted the labeled neurons (see Supplementary Figure 1C) and observed considerable variability between different brains. Due to anticipated variability, we did not count the labeled neurons from trans-TANGO and retro-TANGO techniques in the Leucokinin neurons. Furthermore, neither technique labels all postsynaptic or presynaptic neurons, respectively. A recent study on the retro-TANGO technique (Sorkac et al., 2023) found a minimum threshold: the presynaptic neuron must form a certain number of synapses with the neuron of interest to be adequately labeled. According to this paper, the established threshold is 17 synapses. It is likely that the trans-TANGO technique also has a threshold relating to the number of labeled neurons, contingent on the synapse count. This would explain the discrepancy between the two results.

      Unfortunately, we have not been able to provide functional data pointing to the activation of SELK neurons by sucrose or caffeine. However, our active-GRASP data indicates that the connectivity between Gr64f<sup>GRNs</sup> and Gr66a<sup>GRNs</sup> with SELK neurons is present and functional.

      How many Leucokinin-positive cells are in the SEZ? Does the RNA-seq data provide further information about the SELK neurons? Potential receptor candidates for how they integrate hunger signals? AMPKa was described to be required in LHLK neurons.

      There are two SELK neurons in the SEZ. Due to the nature of our bulk RNA sequencing (RNAseq), we cannot link any additional gene expressions detected in our transcriptomic analysis specifically to the SELK neurons regarding the integration of various signaling processes. Furthermore, the single-cell RNA sequencing (scRNAseq) data available from the Drosophila brain, as reported by Li et al. (2022), does not allow accurate differentiation between SELK and LHLK neurons. To understand how these neurons integrate both metabolic and sensory information, it is crucial to conduct a focused RNAseq study specifically on the SELK neurons to understand how these neurons integrate both metabolic and sensory information. This targeted analysis would provide the necessary insights to elucidate their functional roles better. However, according to the data derived from the connectome, SELK neurons might be cholinergic, and this neurotransmitter might be involved in controlling also the behavior of the flies.

      According to previous studies (Yurgel et al., 2019), the Lk-GAL4 line is also expressed in the VNC, thus the authors could make use of the tsh-GAL80 tool to clean up the line. This study also performed GCaMP imaging in fed and 24h starved animals in SELK and couldn't find a difference, can the authors explain this discrepancy?

      We thank the reviewer for this suggestion. We have now added a new piece of data using the tsh-Gal80 transgene in our PER experiments (Supplementary Figure 7A). Blocking the expression of TNT in the ABLK neurons does not affect the main conclusion of the behavioral results. As stated previously, we were unable to obtain in vivo Ca imaging responses in SELK neurons upon exposure to sucrose, caffeine, or mixtures of sucrose and caffeine. We do not believe this is a discrepancy with previous works like Yurgel et al., 2019. It is likely that we faced technical issues regarding expression stability and that the stimulation was possibly too weak to detect changes in GFP levels

      Reviewer #2 (Recommendations for the authors):

      As mentioned above I do not have any major comments on the manuscript, but there are a few points that I feel should be considered:

      (1) The identification of the Lk-candidate neurons in the connectome remains a bit mysterious. In the method sections, this reads as follows "manual and visual criteria were applied to identify the neurons of interest ". a) What precisely was done to get to the candidates?b) Are there alternative candidates that may be Lk-neurons? c) How would another neuron affect the conclusion of the downstream analysis?

      We thank the reviewer for this comment. We have now modified and added new information in the connectomic section, reinforcing our conclusions and correcting the results obtained.

      Our GRASP, BacTRace, and immunohistochemistry experiments pointed to SELK neurons as postsynaptic to both Gr64f<sup>GRNs</sup> (sweet) and Gr66a<sup>GRNs</sup> (bitter). To identify which neurons in the connectome could be the SELK neurons, we utilized a previously described set of GRNs already identified in the connectome (Shiu et al., 2022). We extracted all postsynaptic neurons to the sweet and bitter GRNs identified and intersected both datasets, retaining only those candidate hits receiving simultaneous input from sweet and bitter GRNs. This process yielded a total of 333 hits. Through visual inspection, we discarded all hits that were merely neuronal fragments or neurons that clearly were not our candidates. We narrowed the list down to a final set of 17 candidate neurons whose arborization was located in the SEZ. We reduced the candidates to two final entries from this list: ID 720575940623529610 (GNG.276) and ID 720575940630808827 (GNG.685). The GNG.276 neuron had a counterpart in the SEZ identified as GNG.246. Both of these neurons were annotated as DNg70 in the Flywire database. GNG.685 had a counterpart identified as GNG.595, and these two neurons were classified as DNg68. In both cases, the neuronal candidates, DNg70 and DNg68, were classified as descending neurons, a characteristic of previously described SELK neurons (Nässel et al., 2021). In our initial analysis published in bioRxiv and sent for revision, we identified DNg70 as potentially the SELK neurons based solely on the morphology of the neurons via visual inspection. However, we employed a better method to determine which candidate is more likely to be the SELK neurons, concluding that DNg68, rather than DNg70, represents the SELK neurons. Briefly, we performed an immunohistochemistry for GFP in the Lk-Gal4>UAS-CD8:GFP flies. We aligned the resulting image in a Drosophila reference brain (JRC2018 U) using the CMTK Registration plugin in ImageJ. The resulting image was skeletonized using the Single Neurite Tracer plugin in ImageJ and later uploaded to the Flywire Gateway platform to compare the structure of the aligned and skeletonized SELK neurons to our candidates. This comparison clearly indicated that the DNg68 neurons are the best candidates for representing the SELK neurons, rather than DNg70. We have updated the text and Figures 6 and Supplementary Figure 6 to reflect the new results. These new results do not alter the conclusions of the paper.

      (2) In the transcriptomic experiments It seems that the raw transcripts are reporters, rather than normalised data. Why?

      All transcriptomic data is normalized. In Figure 1 the differential expression was calculated using Deseq2 normalized counts. In Figure 2, Transcripts Per Million (TPM) were calculated using the Salmon package and normalized for the gene length.

      (3) The expression of nAChRbeta1 in the transcriptomic data is rather striking. However, this remains currently not addressed: is this expression real?

      We have not confirmed the upregulation or downregulation in gene expression for other but for Leucokinin, which is our main interest. We found the presence of nAChRbeta1 interesting, as GRNs are cholinergic (Jaeger et al., 2018), suggesting that it would make sense to find cholinergic receptors in G2Ns. However, it is possible that these receptors are expressed in all G2Ns and serve as a common means of communication.

      (4) The description of the behavioural experiments in the results section is rather brief. I had a hard time following it since the genotypes are not repeated nor is it stated what is different in the experimental group vs control (but instead simply what changes in the experimental group, in a rather discussion-like fashion).

      We thank the reviewer for the comment, we have rewritten this section to improve its clarity.

      (5) If I understand the genetics for the behavioural experiments correctly it addresses the entire Lk-Gal4 expressing population, thus it is not possible to describe the role of the two SEZ neurons, but rather LkGal4 neurons. This should be clarified.

      We thank the reviewer for this comment. Indeed, the Lk-Gal4 driver we used drives expression in all Leucokinin neurons, making it impossible to distinguish between the SELK, LHLK, or ABLK neurons. We have added a new piece of behavioral data by using the tsh-Gal80 transgene to prevent the expression of TNT in the ABLK neurons (Supplementary Figure 7A), but still we cannot distinguish between SELK and LHLK. We have rewritten the text to clarify this fact.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript is well-written, I only have one minor suggestion for improvement. In Figure 8C, please clarify the use of TNT to block Lk release.

      We thank the reviewer for the comment, we have clarified the use of TNT in the text.

      References Clyne, J. D. & Miesenböck, G. Sex-Specific Control and Tuning of the Pattern Generator for Courtship Song in Drosophila. Cell 133, 354–363 (2008).

      Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124–138 (2024).

      Engert, S., Sterne, G. R., Bock, D. D. & Scott, K. Drosophila gustatory projections are segregated by taste modality and connectivity. Elife 11, e78110 (2022).

      Jaeger, A. H. et al. A complex peripheral code for salt taste in Drosophila. Elife 7, e37167 (2018).

      Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015).

      Nässel, D. R. Leucokinin and Associated Neuropeptides Regulate Multiple Aspects of Physiology and Behavior in Drosophila. Int J Mol Sci 22, 1940 (2021).

      Shiu, P. K., Sterne, G. R., Engert, S., Dickson, B. J. & Scott, K. Taste quality and hunger interactions in a feeding sensorimotor circuit. eLife 11, e79887 (2022).

      Walker, S. R., Peña-Garcia, M. & Devineni, A. V. Connectomic analysis of taste circuits in Drosophila. Sci. Rep. 15, 5278 (2025).

    1. Author response:

      Reviewer #1:

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      We thank the reviewer for this insightful comment. We agree that signals from the same neuron may be collected by adjacent channels. To address this concern in our software, we plan to add a routine to SpikeMAP that allows users to discard nearby channels where spike count correlations exceed a pre-determined threshold. Because there is no ground truth to map individual cells to specific channels on the hd-MEA, a statistical approach is warranted.

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      This is a valid concern. To ensure that firing rates are relatively constant over the duration of a recording, we will plot average spike rates using rolling windows of a fixed duration. We expect that population firing rates will remain relatively stable across the duration of recordings.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We agree that further cycles of experiments could be performed with SOM, VIP, and other neuronal subtypes, and we hope that researchers will take advantage of SpikeMAP too. We will clarify this possibility in the Discussion section of the manuscript.

      Reviewer #2:

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      We thank the reviewer for this comment. As detailed in Table 1, SpikeMAP is the only method that performs E/I sorting on large-scale multielectrodes, hence a comparison to competing methods is not currently possible. That being said, many of the pre-processing steps of SpikeMAP (Figure 1) involve methods that are already well-established in the literature and available under different packages. To highlight the contribution of our work and facilitate the adoption of SpikeMAP, we plan to provide a “modular” portion of SpikeMAP that is specialized in performing E/I sorting and can be added to the pipeline of other packages such as KiloSort more clearly.  This modularized version of the code will be shared freely along with the more complete version already available.

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      We agree with the reviewers that there are indeed similarities between our work and the Hilgen et al. paper. However, while the latter employs optogenetics to stimulate neurons on a large-scale array, their technique does not specifically target inhibitory (e.g., PV) neurons as described in our work. We will clarify our paper accordingly.

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      The title of our work will be edited to make it clear that while elements of the pipeline are well-established and available from other packages, we are the first to extend this pipeline to E/I sorting on large-scale arrays.

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution, might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer and will point out limits of the center-of-mass algorithm based on the article of Scopin et al (2024). Further, we will augment the existing code library to include monopolar triangulation or grid-based convolution as options available to end-users.

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We will clarify these points. Specifically, the value of 90kHz was chosen because it provided a reasonable temporal characterization of spikes; this value, however, can be adjusted within the software based on user preference.

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We will re-check Fig.2B which seems to have error in rendering, likely due to conversion from its original format.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      Here, the reviewer is suggesting that it may be better to perform PCA on several channels at once, since spikes can occur at several channels at the same time. To address this concern, small routine will be written allowing users to choose how many nearby channels to be selected for PCA.

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      It is true that k=2 is a pre-determined choice in our software. In practice, we found that k>2 leads to poorly defined clusters. However, we will ensure that this parameter can be adjusted in the software. Furthermore, if the user chooses not to pre-define this value, we will provide the option to use a Calinski-Harabasz criterion to select k.

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We share the reviewer’s concern and will add results that include a population of neurons to assess the robustness of this phenomenon.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      We applied stringent criteria to exclude cells, and we will revise the main text to be clear about these criteria, which include a minimum spike rate and the use of LDA to separate out PCA clusters. For the cells that were retained, we will include SNR estimates.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.       

      We will include a comparison of firing rates for E and I neurons. It is possible that I cells are located at the border of the MEA due to the site of injections of the viral vector, and not because of an anatomical clustering of I cells per se. We will clarify the text accordingly.

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      As mentioned previously, Kilosort and related approaches do not address the problem of E/I identification (see Table 1). However, they do have pre-processing steps in common with SpikeMAP. We will add some specific comparison points – for instance, the use of k-means and PCA (which is more common across packages) and the use of cubic spline interpolation (which is less common). Further, we will provide a stand-alone E/I sorting module that can be added to the pipeline of other packages, so that users can use this functionality without having to migrate their entire analysis.

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      We apologize for this issue. It seems there was a rendering problem when converting the figure from its original format. We will address this issue in the revised version of the manuscript.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      We will mention how many flashes/animals/slices were employed in the GT data and provide open access to these data.

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We thank the reviewer for the suggestion that SpikeMAP could be tested on artificially generated spike trains and will add the citation of the two papers mentioned. We hope future efforts will employ SpikeMAP on both synthetic and experimental data to explore the neural dynamics of E and I neurons in healthy and pathological circuits of the brain.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and support the conclusions drawn in the discussion.

      Comments on latest version:

      The third version of this paper has been substantially improved. The English is significantly better, there are only few paragraphs and sentences which are hard to understand (see my comments and suggestions below). Almost all of my suggestions were incorporated.

      We would like to thank the reviewer for the comments and incorporated the suggestions within the latest version of the manuscript.

      Remaining minor concerns:

      About epileptic and non-epileptic (non-affected) tissue. I am aware that temporal lobe neocortical tissue derived from epileptic patients is regarded as non-affected by many groups, and they are quite similar to the cortex of non-epileptic (tumour) patients in their electrophysiological properties and synaptic physiology. But please, note, that one paper you cited did not use samples from epileptic patients, but only tissue from non-epileptic tumor patients (Molnár et al. PLOS 2008).

      When you look deeper, and make thorough comparison of tissues derived from epileptic and non-epileptic patients, there are differences in the fine structure, as well as in several electrophysiological features. See for example Tóth et al., J Physiol, 2018, where higher density of excitatory synapses were found in L2 of neocortical samples derived from epileptic patients compared to non-epileptic (tumor) patients. Furthermore, the appearance of population bursts is similar, but their occurrence is more frequent and their amplitude is higher in tissue from epileptic compared to non-epileptic patients. So, I still cannot agree, that temporal neocortex of epileptic patients with the seizure focus in the hippocampus would be non-affected. Therefore I suggested to use the term biopsy tissue.

      We are thankful for this comment on using non-epileptic tissue also by others. We are also aware that Molnár et al. 2008 worked with tumor tissue.

      It is still not emphasized in the first paragraph of the Discussion, that only excitatory axon terminals were investigated.

      We now mentioned in the first paragraph of the discussion that only excitatory synaptic boutons were investigated.

      The text in the Results and the Discussion are somewhat inconsistent.

      The last two paragraphs of the Results section ends with several sentences which should be part of the discussion, such as line 328: This finding strongly supports multivesicular release... or line 344: --- pointing towards a layer-specific regulation of the putative RRP. Moreover, the results suggest that... and line 370: ... it is most likely... Please, correct this.

      We disagree with the reviewer on these points because these sentences summarizes the findings.

      The first paragraph of the Discussion summarizes the work of the quantitative EM work and gives one conclusion about the astrocytic coverage. This last sentence is inconsistent with the other parts of the paragraph. I would either write that "astrocytic coverage was also investigated" (or something similar), or move this sentence to the paragraph which discusses the astrocytic coverage.

      Results line 180-183. "Special connections" between astrocytic processes and synaptic boutons are mentioned, but not shown. Either show these (but then prove with staining!), or leave out this paragraph.

      We deleted this paragraph as suggested.

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required and answered all of my concerns, included additional data sets, and clarified statements where needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor suggestions:

      Synaptic density, lines 189-193. If you say "comparatively" high, then compare to something (cite your own work for the other layers, and tell the approximative values for the other layers). Same in line 194 comparably high to what? Other option: say "relatively high".

      We corrected the sentences as suggested by the reviewer.

      Line 206: When present, mitochondria (comma missing)

      Corrected as suggested by the reviewer.

      Line 265: Dot is missing at the end of the sentence (after Shapira et al. 2003)

      Corrected as suggested by the reviewer.

      Lines 300-301: Check the English for this sentence: significant difference BETWEEN TWO sublaminae and not significant difference for both sublaminae.

      Corrected as suggested by the reviewer.

      Lines 304-305: Check the sentence, please, it is not understandable without the text in parenthesis.

      Corrected as suggested by the reviewer.

      Line 354 Dot missing at the end of the sentence (after Figure 6A, B)

      Corrected as suggested by the reviewer.

      Line 354-358: Please rephrase this sentence (too complicated, not understandable). I do not understand why results of the L4, L5, L6 are described here. What does it mean "Astrocytes and their fine processes formed a relatively dense, but a comparably loose network within the neuropil in L1"? Dense or loose?

      In the experiment measuring the volume fraction of astrocytic processes (Figure 6C), all six cortical layers were analyzed, thus we compared the values obtained for L1 with the results for L4, L5 and L6. For more clarity, we rephrased the sentence: “Astrocytes and their fine processes formed a relatively dense network in L4 and L5, but a comparably loose one within the neuropil in L1…” We also rephrased other sentences in this paragraph (as also suggested below).

      Lines 359-369: Please rephrase this paragraph. The sentences are too complicated, have too many parentheses, and are not understandable. I suggest to write first how many synapses were examined in L1 and L4, then how many of them were on spine and on dendrites (either n or %). Then give the values how many (n or %) of them were "tripartite synapses", out of spine synapses and of dendritic synapses in both layers. How many of them were partially covered in both layers. Please, write the data in a systematic way. The best would be to give the values in a table as well. This way it will be more understandable (now, it is chaotic, hard to follow).

      We rephrased the paragraph and added a new table (3).

      Line 383: Dot missing from the end of the sentence.

      Corrected as suggested by the reviewer.

      Line 436: Reconsider "comparably low compared to". The comparably means what in this case? The whole paragraph is hard to understand, please, check and review for improvements to the use of English or use chatGPT to check it.

      We corrected the sentence according to the reviewer’s suggestion.

      Line 487: Same thing again: "The comparably largest size of the RP in L1 when compared..." What would you like to say with "comparably"? Check the meaning of this word in a dictionary, please. I have the feeling that you are using this word instead of "relatively".

      Corrected as suggested by the reviewer.

      Line 488 "and TO that found fot L4 and L5 in rodents..."

      Corrected as suggested by the reviewer.

      Line 493-495: Same again, comparably when compared, correct, please.

      Corrected as suggested by the reviewer.

      Supplemental figures: Now I do understand why Hu-01 and Hu-02 are twice, and I think, 3 patients were examined for L1a and three for L1b. But which side is which on the subfigures? Left side (Hu-01, 02 03) was used for L1a, or L1b? Could you write this in the legend, or mark on the figure (at least at one subfigure), please?

      We implemented a comment for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling.

      There is some direct evidence of reversible beta cell inactivation in rodent / in vitro models. We had already mentioned this in the discussion, but we have added some text emphasizing / clarifying the role of this evidence (lines 359–362).

      Others have also argued that some analyses of insulin treatment in conventional T2D, which has a stronger effect in patients with higher glucose before treatment, provides indirect evidence of reversal of glucotoxicity. We have also mentioned this in the revised paper (lines 284–285).

      For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a ”pre-KPD” condition? What is known about the disease’s timescale?

      The answers to some of these questions are not entirely clear—patients present with very high glucose, and thus must be treated immediately. Due to a lack of antecedent data it is not entirely clear what the pre-KPD condition is, but there is some evidence that KPD is at least not preceded by diabetes symptoms. This point is already noted in the introduction of the paper and Table 1. However, we have added a small note clarifying that this does not rule out mild hyperglycemia, as in prediabetes (and indeed, as our model might predict) (lines 76–77). Similarly, due to the necessity of immediate insulin treatment, it is not clear from existing data whether the disorder manifests more strongly in fasting glucose or glucose response, although it is likely in both. (We might infer this since continuous insulin treatment does not produce fasting hypoglycemia, and the complete lack of insulin response to glucose shortly after presentation should produce a strong effect in glycemic response.) We believe our existing description of KPD lists all of the relevant timescales, however we have also slightly clarified this description in response to the first referee’s comments (lines 66–73, 83)

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the ”honeymoon” phase of T1D. Making a strong case for the model in this scenario would be fascinating.

      This is an excellent idea, which had not occurred to us. We have briefly discussed this possibility in the remission (lines 281–291), but plan to analyze it in more detail in a future manuscript.

      Reviewer #1 (Recommendations for the author):

      Whenever simulation results are presented, parameter values should be specified right there in the figure captions.

      We have added the values of glucotoxicity parameters to the caption of Figure 2. In other figures, we have explicitly mentioned which panel of Figure 2 the parameters are taken from. Description of the non-glucotoxicity parameters is a bit cumbersome (there are a lot of them, but our model of fast dynamics is slightly different from Topp et al. so it does not suffice to simply say we took their parameters) so we have referred the reader to the Materials and Methods for those.

      I was confused by the language in Figure 4. Could the authors clarify whether they argue that: (1) the observed KPD behaviour is the result of the system switching from one stable state to another when perturbed with high glucose intake? (2) the observed KPD behaviour is the result of one of the steady states disappearing with high glucose intake?

      What we mean to say is that during a period of high sugar intake or exogeneous insulin treatment, one of the fixed points is temporarily removed—it is still a fixed point of the “normal” dynamics, but not a fixed point of the dynamics with the external condition added. Since when glucose (insulin) intake is high enough, only the low (high)-β fixed point is present, under one of these conditions the dynamics flow toward that fixed point. When the external influx of glucose/insulin is turned off, both fixed points are present again—but if the dynamics have moved sufficiently far during the external forcing, the fixed point they end up in will have switched from one fixed point to the other. We have edited the text to make this clearer (lines 153–185). Do note, however, that in response to both referee’s comments (see below), Figures 3 and 4 have been replaced with more illuminating ones. This specific point is now addressed by the new Figure 3.

      The adaptation of the prefactor ’c’ was confusing to me. I think I understood it in the end, but it sounded like, ”here’s a complication, but we don’t explain it because it doesn’t really matter”. I think the authors can explain this better (or potentially leave out the complication with ’c’ altogether?).

      Indeed, the existence of an adaptation mechanism is important for our overall picture of diabetes pathogenesis, but not for many of our analyses, which assume prediabetes. Nonetheless, we agree that the current explanation of it’s role is confusing because of its vagueness. We have elaborated the explanation of the type of dynamics we assume for c, adding an equation for its dynamics to the “Model” section of the Materials and methods, explained in lines 456–465. We have also amended Figure 1 to note this compensation.

      I expect the main impact of this work will be to get clinical practitioners and biomedical researchers interested in the intermediate timescale dynamics of β-cells and take seriously the possibility that reversible inactive states might exist. But this impact will only be achieved when the results are clearly and easily understandable by an audience that is not familiar with mathematical modelling. I personally found it difficult to understand what I was supposed to see in the figures at first glance. Yes, the subtle points are indeed explained in the figure captions, but it might be advantageous to make the points visually so clear that a caption is barely needed. For example, when claiming that a change in parameters leads to bistability, why not plot the steady state values as a function of that parameter instead of showing curves from which one has to infer a steady state?

      I would advise the authors to reconsider their visual presentation by, e.g., presenting the figures to clinical practitioners or biomedical researchers with just a caption title to test whether such an audience can decipher the point of the figure! This is of course merely a personal suggestion that the authors may decide to ignore. I am making this suggestion only because I believe in the quality of this work and that improving the clarity of the figures and the ease with which one can understand the main points would potentially lead to a much larger impact on the presented results.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader had to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. (These new figures are Fig. 3–5 in the revised manuscript.)

      Could the authors explicitly point out what could be learned from their work for the clinic? At the moment treatment consists of giving insulin to patients. If I understand correctly, nothing about the current treatment would change if the model is correct. Is there maybe something more subtle that could be relevant to devising an optimal treatment for KPD patients?

      This is another very good point. We have added a new figure (Fig. 7) in our results section showing how this model, or one like it, can be analyzed to suggest an insulin treatment schedule (once parameters for an individual patient can be measured), and added some discussion of this point (lines 224–240) as well as lifestyle changes our model might suggest for KPD patients to the discussion (lines 413–425).

      Similarly, could the authors explicitly point out how their model could be experimentally tested? For example, are the functions f(G) and g(G) experimentally accessible? Related to that, presumably the shape of those functions matters to reproduce the observed behaviour. Could the authors comment on that / analyze how reproducing the observed behaviour puts constraints on the shape of the used functions and chosen parameter values?

      g(G) has not been carefully measured in cellular data, however it could be in more quantative versions of existing experiments. Further, our model indeed requires some general features for the forms of f(G) and g(G) to produce KPD-like phenomena. We have added some comment on this to the discussion section of the revised manuscript (lines 367–372).

      Could the authors explicitly spell out which parameters they think differ between individual KPD patients, and which parameters differ between KPD patients and ’regular’ type 2 diabetics?

      In general we expect all parameters should vary both among KPD patients and between KPD / “conventional” T2D. The primary parameter determining whether KPD and conventional T2D, is seen, however, is the ratio kIN/kRE. We have elaborated on both these points in the revised mansuscript. (Lines 186–192, 250–257.)

      I was confused about the timescale of remission. At one point the authors write “KPD patients can often achieve partial remission: after a few weeks or months of treatment with insulin” but later the authors state that “the duration of the remission varies from 6 months to 10 years”.

      The former timescale is the typical timescale achieve remission. After remission is reached, however, it may or may not last—patients may experience a relapse, where their condition worsens and they again require insulin. We have edited the text to clarify this distinction (lines 66–73).

      When the authors talk about intermediate timescales in the main text could they specify an actual unit of time, such as days, weeks, or months as it would relate to the rate constants in their model for those transitions?

      We have done so (lines 86–87, figure 1 caption, figure 2 caption). Getting KPD-like behavior requires (at high glucose) the deactivation process to be somewhat faster than the reactivation process, so the relevant scales are between weeks (reactivation) and days (deactivation at high G).

      The authors state ”Our simple model of β-cell adaptation also neglects the known hyperglycemiainduced leftward shift in the insulin secretion curve f(G) in Eq. (2)) ”. This seems an important consideration. Could the authors comment on why they did not model this shift, and/or explicitly discuss how including it is expected to change the model dynamics?

      We agree that this process seems potentially relevant, as it seems to happen on a relatively fast timescale compared to glucose-induced β-cell death. It is, however, not so well characterized quantitatively that including it is a simple matter of putting in known values—we would be making assumptions that would complicate the interpretation of our results.

      It is clear that this effect will need to be considered when quanitatively modelling real patient data. However, it is also straightforward to argue that this effect by itself cannot produce KPD-like symptoms, and will only tend to reduce the rate of glucotoxocity necessary to produce bibstability. We have added a discussion of this in the revisions (lines 307–315). We have also, in general, expanded the discussion of the effects that each neglected detail we have mentioned is expected to have (lines 292–315).

      The authors end with a statement that their results may “contribute to explanation of other observations that involve rapid onset or remission of diabetes-like phenomena, such as during pregnancy or for patients on very low calorie diets.” Could the authors spell out exactly how their model potentially relates to these phenomena?

      Our thinking is that, even when another direct cause, such as loss of insulin resistance, is implicated in reversal of diabetes, some portion of the effect may be explained by reversal of glucotoxicity. This is indeed at this point just a hypothesis, but we have expanded on it briefly in the revision. (Lines 281–291.)

      Minor typos:

      In Figure 2.D the last zero of 200 on the axis was cut off.

      Line 359 - there is a missing word ”in the analysis”.

      We have fixed these typos, thanks.

      Reviewer #2 (Recommendations for the author):

      The manuscript could be significantly improved in two key areas: the presentation of the analysis, and the relation with experimental phenomenology.

      Regarding the analysis presentation, the figures could be substantially enhanced with minimal effort from the authors. At present, they are sparse, lack legends, and offer only basic analysis. The authors should consider presenting, for example, a bifurcation diagram for beta cell mass and fasting glucose levels as a function of kIN, and how insulin sensitivity and average meal intake modulate this relationship. The goal should be to present clear, testable predictions in an intuitive manner. Currently, the specific testable predictions of the model are unclear.

      The response to this question is copied from the reponses to related questions from the first referee.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader thad to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. We have also supplemented our phase diagram that shows the effects of SI and the total beta cell population with bifurcation diagrams showing β as SI and βTOT are varied. (These new figures are Fig. 3–5 in the present manuscript.) Finally, we have added another figure analyzing the model’s predictions for the optimal insulin treatment and the resulting time needed to achieve remission (Fig. 7)

    1. Author response:

      Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      Thank you for a nice summary of the key points of our study. Below you will find our responses to your questions.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      Great suggestions! For the first suggestion, we will include a clearer justification in the revised manuscript. For the second one, all participants received task practice prior to scanning, and task accuracy exceeded 90% (we will explicitly report the accuracy rate in revision), suggesting the tasks were not overly demanding. Although ceiling effects limit the interpretability of behavioral-performance correlations, we argue that higher task demands would likely require greater attentional effort, leading to stronger modulation in hPIT, which aligns with our findings when we manipulated the attentional load.

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      The response of hPIT is generally insensitive to stimulus category, however, the reviewer is correct in noticing that attentional modulation in hPIT is slightly stronger to faces than scenes and scrambled images. Although faces used in the task had neutral expressions and the scene pictures were also neutral, it is indeed possible that potential semantic familiarity or emotional salience may contribute to the subtle category-related effects in the results of experiment 3. This point will be noted in the revised manuscript.

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      Yes, individual differences exist. In the revised manuscript, we will include individual subject data points in the figure 6B.

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      We agree that besides resting-state connection, task-based functional connectivity analyses would have the potential to provide additional information about whether hPIT serves as a convergence node or a control hub. While fMRI data are not the best to generate directional flow of information due to the low temporal resolution, we will conduct task-based functional connectivity analyses.

      We also observed modest hemispheric asymmetries in connectivity—for instance, both left and right hPIT showed stronger connectivity with right-hemisphere attention nodes. This will be described in the revised supplement.

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      The size and location of hPIT are generally consistent across subjects, as shown in Supplementary Figure 1. The consistency is also supported by figure 4C. The hPIT is defined by conjunction maps across three tasks and then manually delineated avoiding overlapping voxels with FFA and LOp. The FFA was defined using an independent contrast (Exp3 contrast [face-scene]) and the Lop location was defined by anatomical parcellation (Glasser et al., 2016).

      Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strengths

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests - were conducted, and the results are clearly reported.

      Thank you for a nice summary of the key points and strengths of our study.

      Weaknesses

      (1) The sample size is relatively small (n = 15), and inter-subject variability is big in Figures 5 and 6, as seen in the spread of individual data points and error bars. The analysis of attention-modulated voxel map intersections appears to be influenced by multiple outliers.

      We agree that the sample size (n = 15) is not ideal, and we acknowledge that some data points in Figures 5 and 6 appear to be potential outliers. However, according to conventional outlier detection criteria, all data points are within three standard deviations of the group mean and were therefore retained for analysis. Moreover, the attention-modulated voxel intersection map shown in Figure 4C is insensitive to outliers, because the intersection map plotted is based on the number of subjects.

      (2) The authors acknowledge important limitations, including the lack of exploration of feature-based attention and the temporal constraints inherent to fMRI.

      Yes, we hope to address these limitations in future studies.

      (3) Prior research has established that regions such as the prefrontal cortex (PFC) and posterior parietal cortex (PPC) are involved in both endogenous and exogenous attention and have been proposed as attentional priority maps. It remains unclear what is uniquely contributed by hPIT, how it functionally interacts with these classical attentional hubs, and whether its role is complementary or redundant. The study would benefit from more direct comparisons with these regions.

      In this study, we define the ROI base on intersection across three different types of spatial attention tasks, and the hPIT stands out in showing spatial attentional modulation across tasks. This could be due to the weak lateralized responses in PFC/PPC. To evaluate whether a region qualifies as a priority map, we applied four criteria (as mentioned in introduction). While dorsal and ventral attention network (DAN and VAN) regions can be considered important components of the priority map system, our findings suggest that among the regions tested, hPIT meets all four criteria. In Experiment 2, we included regions such as VFC (as part of PFC) and IPS (as part of PPC), and our findings suggest these areas are more involved in top-down attention. We agree with the reviewer’s suggestion and will perform additional analysis on PPC and PFC.

      (4) The functional connectivity analysis is only performed on resting-state data, and this approach does not capture context-dependent interactions. Task-based data analysis can provide stronger evidence.

      We acknowledge that resting-state FC is limited in assessing task-specific communication. To further investigate the role of hPIT, we plan to conduct task-based functional connectivity analyses.

      (5) The study does not report whether attentional modulation in hPIT is consistent across the two hemispheres. A comparison of hemispheric effects could provide important insight into lateralization and inter-individual variability, especially given the bilateral localization of hPIT.

      We thank the reviewer for this suggestion. hPIT was localized bilaterally using the same intersection-based method in Experiment 1. We have now performed additional analysis and found in Experiment 3, the difference in attentional modulation between high and low load conditions was significant in the right hPIT but not in the left. This result will be reported in the revised manuscript.

    1. Author response:

      Below, we will address point by point any and all concerns of the reviewers.

      Reviewer #1:

      There are no major concerns, but some material could be added for clarity and to make the work more accessible to a more general scientific audience.

      We will add text for clarity and to make the work more accessible to a general audience per this comment and similar suggestions of the other reviewers.

      (1.1) A figure clearly showing the habituation protocol and the use of the dishabituators would be a good addition, even if the procedure has been done before and is cited. There can always be readers who are seeing this for the first time.

      We do think this is a good idea as the time scales of the experiment will be clearly marked as well and we plan to generate one in the revised manuscript.

      (1.2) It would also be nice to comment on other ways dishabituation can happen (for example, when the stimulus is removed for a short time and returns) and what their time scales are.

      If the stimulus is withheld, spontaneous recovery occurs, a process distinct from dishabituation and worth exploring on its own. In a previous publication (Semelidou et al. eLife 2018;7:e39569), we have shown that in this habituation paradigm with 4 min exposure either to the aversive Octanol, or the attractive Ethyl Acetate, spontaneous recovery occurs on or after 6 minutes after the habituated stimulus is withheld. This contrasts the immediate effect of the single dishabituating stimulus, delivered for a few seconds at the end of exposure to the habituator. Granted that per Thomson (Neurobiol Learn Mem. 2009), spontaneous recovery is a characteristic of habituation, we will work this point in the text.

      (1.3) And more generally, the paper could perhaps improve by making a stronger case for why the results are important not just for flies but for neuroscience in general.

      Thank you for the encouragement. We will try to rationally generalize our findings.

      Reviewer #2:

      (2.1) However, the claim that this represents a fundamental difference between homosensory and heterosensory pathways for dishabituation is overstated.

      We had no intention of stating more than the fact that footshock and yeast odor dishabituators relay these stimuli to the mushroom bodies via distinct dopaminergic neurons, hence differentiating distinct dishabituating stimuli via the mechanosensory (footshock) and olfactory (yeast odor) modalities as they engage the mushroom bodies. As the reviewer suggests we will use more measured and specific language to state the above.

      (2.2) The introductory section does not adequately present current broad models for habituation and dishabituation.

      This was not done intentionally, but rather because we aimed at a less extended introductory section and ostensibly this resulted in brief and possibly inadequate presentation of current habituation models. We will present a much more detailed introduction and detail of habituation and dishabituation models in the revised manuscript (Also see reply to point 3.5 below).

      (2.3) There are many different time scales, even for Drosophila olfactory habituation. These, as well as potential underlying mechanistic differences, need to be acknowledged; any claim should be specifically qualified for the time scales being studied here.

      We understand and appreciate the point of the reviewer, as well as its significance and we will address this both in the revised text, but also by the paradigm figure we will add as stated above (point 1.1), where the time scales will be explicitly included and emphasized.

      (2.4) Additionally, there are several unclear, vague, and inaccurate sections and statements. A more careful, precise, and considered presentation of current views, as well as more measured claims of the impact of the findings, would substantially enhance my enthusiasm.

      We will address these concerns of course, though pointing out the specific offending parts would ascertain addressing them thoroughly. As stated above, we will incorporate current views in the introduction and when discussing our results and their impact.

      Reviewer #3:

      (3.1) The key issue is that the main concepts of this manuscript appear to be based on a misunderstanding/misinterpretation of the literature. As the authors set out to settle the debate "whether the novel dishabituating stimulus elicits sensitization of the habituated circuits, or it engages distinct neuronal routes to bypass habituation reinstating the naïve response", it seems that the authors based their investigation on the premise that "sensitization" is mediated by a facilitatory process within the S-R pathway, and "dishabituation" by a facilitatory process outside the S-R pathway. This is not the status quo in the field, particularly with the prevailing theory like the Dual-Process Theory.

      We appreciate the reviewer’s comment and the opportunity to clarify the conceptual framework of our work. Our intention was in fact to test the Groves and Thomson hypothesis (Neurobiol Learn Mem. 2009), in our olfactory habituation system. As such, dishabituation could have been the result of a facilitatory process within the S-R pathway, or from mechanisms outside of it. Our experimental design allowed to distinguish these possibilities and our results clearly show that dishabituation involves circuitry outside the S-R pathway. We do thank the reviewer for pointing out that we have not articulated clearly this intention and we will take care to communicate this effectively in the revised manuscript.

      (3.2) The original version of Dual-Process Theory (Groves and Thompson 1970, but also see Thompson 2008, Neurobiol Learn Mem) already hypothesized that habituation happens within the specific S-R pathway, and sensitization occurs separately in an "organism-wide" state system that modulates the output of all S-R pathways.

      As mentioned above, we are aware of the Dual-Process hypothesis. In fact, our data demonstrate that activity outside the olfactory S-R pathway, engaging novel neuronal circuits, mediates dishabituation. Unlike habituation, these circuits mediating dishabituation include at minimum, the mushroom bodies, the dopaminergic system and the APL neurons. In our view this does not support the “organism-wide state” system, but rather particular circuits that in agreement with the Groves and Thomson hypothesis, are outside the S-R pathway and modulate its behavioral output. We will work these concepts in the discussion section of the revised manuscript.

      (3.3) Dishabituation is recognized by the Dual-Process Theory as sensitization (organism-wide facilitation) manifested on top of existing habituation (depressed S-R pathway). This notion has been supported by a wide range of studies, including cat spinal cord reflex (e.g. Spencer et al. 1966) and work in Aplysia on heterosynaptic facilitation for both sensitization and dishabituation. Therefore, simply showing that the newly identified facilitatory pathways are outside the S-R habituation pathway is insufficient to demonstrate dishabituation.

      We respectfully disagree with the concluding sentence here. In all of our experiments, we observe a clear recovery of olfactory avoidance after exposure to the footshock, or yeast odor dishabituators. Moreover, the dishabituators are emulated by (photo)activation of particular neuronal circuits and the recovery of olfactory avoidance is blocked when these circuits are silenced. Regardless of whether this recovery is classified as dishabituation via sensitization or another facilitatory process, the key point is that the habituated response is reliably reinstated contingent upon the dishabituating stimulus. We believe this meets the established criteria for dishabituation.

      (3.4) As behavioral facilitation of a habituated response can be achieved by dishabituating (specific recovery of the S-R pathway) and/or superimposed sensitizing (organism-wide) processes, dishabituation and sensitization of this olfactory response must be first dissociated; however, the study provided no evidence for the dissociation. Without this piece of evidence, the claim of this paper that the newly identified pathways mediate dishabituation is not fully supported.

      We agree with the reviewer that we have not provided specific evidence dissociating dishabituation and sensitization of the particular olfactory response beyond the evidence implicating particular circuitry in the outcome of facilitation of the olfactory response.

      It should be noted that in photoactivation of the implicated circuitries in naïve flies, we do not observe enhanced octanol avoidance, suggesting that activation of these circuits alone does not induce sensitization. Moreover, our results show that neither footshock nor yeast odor drive an organism-wide sensitization, as silencing specific circuits was sufficient to block dishabituation—something that would not be expected if a global sensitization process was responsible of reinstating the olfactory response.

      Nonetheless, we will also attempt to dissociate sensitization from dishabituation using mutants previously reported deficient in sensitization (Duerr and Quinn, PNAS 1982), assuming these mutants retain normal olfactory habituation. We will also try sensitization protocols in the case of within-modal dishabituation to further clarify the underlying mechanisms. In principle, this includes using diluted Octanol as the habituating stimulus and attempt dishabituation with concentrated octanol.

      (3.5) The literature review of this manuscript has some discrepancies. In the introduction, the authors wrote "initial studies in Aplysia were consistent with the "dual-process theory" (Groves and Thompson 1979), where response recovery due to dishabituation appeared to result from sensitization superimposed on habituation, thus driving reversal of the attenuated response (Carew, Castellucci et al. 1971, Hochner, Klein et al. 1986, Marcus, Nolen et al. 1988, Ghirardi, Braha et al. 1992, Cohen, Kaplan et al. 1997, Antonov, Kandel et al. 1999, Hawkins, Cohen et al. 2006)." Hochner 1986 and Marcus 1988 in fact indicated otherwise. Hochner 1986 suggests that dishabituation and sensitization involve different molecular processes, while Marcus 1988 showed that dishabituation and sensitization have different behavioral characteristics. Therefore, the authors' statement is not supported by the cited literature.

      We are grateful to the reviewer for pointing out these significant discrepancies, consequent of multiple rounds of edits followed by our own oversight. These important publications for this manuscript will be referenced properly in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Comments on introduction:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003_)._

      Comments on materials and methods:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      Comments on results:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      Comments on discussion:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high-fat diet are due in part to adipokinetic hormone (Akh) signaling activation. High-fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on a high-fat diet. Elimination of one of two AkhR-expressing cardiac neurons results in arrhythmia similar to a high-fat diet.

      Strengths:

      The authors propose a novel mechanism for high-fat diet-induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Weaknesses:

      Major comments:

      (1) The authors state, "Arrhythmic pathology is rooted in the cardiac conduction system." This assertion is incorrect as a blanket statement on arrhythmias. There are certain arrhythmias that have been attributable to the conduction system, such as bradycardic rhythms, heart block, sinus node reentry, inappropriate sinus tachycardia, AV nodal reentrant tachycardia, bundle branch reentry, fascicular ventricular tachycardia, or idiopathic ventricular fibrillation to name a few. However the etiological mechanism of many atrial and ventricular arrhythmias, such as atrial fibrillation or substrate-based ventricular tachycardia, are not rooted in the conduction system. The introduction should be revised to reflect a clear focus (away from?) on atrial fibrillation (AF). In addition, AF susceptibility is known to be modulated by autonomic tone, which is topically relevant (irrelevant?) to this manuscript.

      Thank you for the helpful comment. We rephrased the sentence as “Arrhythmic pathology is often rooted in the cardiac conduction system”.

      (2) The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high-fat diet compared to normal-fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      We also observed that fly heart rate is highly variable with periods of fast and slow rates. To control heart rate variability, Ocorr et al. used semi-intact flies to record the heartbeat  (https://doi.org/10.1073/pnas.0609278104). We consider it a rigorous method to get highly consistent results with high quality videos/images. Since our work has a focus on the neuronal inputs to the heart, we did not use the semi-intact method. Our concern is that it is likely to disrupt the neuronal processes during the dissection. Using OCT, we recorded the heartbeat of intact flies in an 8 s time window, when the heartbeat was relatively stable. The different groups of flies, which were fed on a high-fat diet or a normal-fat diet, were recorded using the same method. Thus, we could compare the differences in heart rate.

      (3) The authors state, "to test whether the HFD-induced increase in Akh in the APC affects APC neuron activity, we used CaLexA (https://doi.org/10.3109/01677063.2011.642910)." According to the reference, CaLexA is a tool to map active neurons and would not indicate, as the authors state, whether Akh affects APC neuron activity specifically. It is equally possible that APC neurons may be activated by HFD and produce more Akh. Please clarify this language.

      Thank you for clarifying the calcium reporter, CaLexA. We rephrased this sentence to “to test whether HFD affects APC neuron activity, we used CaLexA”.

      (4) Are the AkhR+ neurons parasympathetic or sympathetic? Please provide additional experimentation that characterizes these neurons. The AkhR+ neurons appear to be anti-arrhythmic. Please expand the discussion to include a working hypothesis of the overall findings on Akh, AkhR, and AkhR+ neurons.

      Noyes et al. showed that Akh treatment increases heartbeat (Noyes, B. E., F. N. Katz, and M. H. Schaffer. 1995. “Identification and Expression of the Drosophila Adipokinetic Hormone Gene.” Molecular and Cellular Endocrinology 109 (2): 133–41.), suggesting that AkhR+ neurons are sympathetic. We showed that high-fat diet induced Akh expression and secretion, which led to stimulation of AkhR+ neuron and increased heart rate, supporting the sympathetic role of the AkhR+ neurons. Additional explanation on the sympathetic & anti-arrhythmic role of the Akh, AkhR, and AkhR+ neurons were added to the discussion.

      (5) The authors state, "Heart function is dependent on glucose as an energy source." However, the heart's main energy source is fatty acids with minimal use of glucose (doi: 10.1016/j.cbpa.2006.09.014). Glucose becomes more utilized by cardiomyocytes under heart failure conditions. Please amend/revise this statement.

      Thank you for pointing this out and providing the reference. We rephrased this sentence “Heart function is dependent on continuous ATP production. Cardiac ATP in Drosophila might come from fatty acids, glucose, and lactate (Kodde et al., 2007), as well as trehalose.”

      Reviewer #2 (Public Review):

      This manuscript explores mechanisms underlying heart contractility problems in metabolic disease using Drosophila as a model. They confirm, as others have demonstrated, that a high-fat diet (HFD) induces cardiac problems in flies. They showed that a high-fat diet increased Akh mRNA levels and calcium levels in the Akh-producing cells (APC), suggesting there is increased production and release of this hormone in a HFD context. When they knock down Akh production in the APCs using RNAi they see that cardiac contractility problems are abolished. They similarly show that levels of the Akh receptor (Akhr) are increased on a HFD and that loss of Akhr also rescues contractility problems on a HFD.

      One highlight of the paper was the identification of a pair of neurons that express a receptor for the metabolic hormone Akh, and showing initial data that these neurons innervate the cardiac muscle. They then overexpress cell death gene reaper (rpr) in all Akhr-positive cells with Akhr-GAL4 and see that cardiac contractility becomes abnormal.

      However, this paper contains several findings that have been reported elsewhere and it contains key flaws in both experimental design and data interpretation. There is some rationale for doing the experiments, and the data and images are of good quality. However, others have shown that HFD induces cardiac contractility problems (Birse 2010), that Akh mRNA levels are changed with HFD (Liao 2021) that Akh modulates cardiac rhythms (Noyes 1995), so Figures 1-4 are largely a confirmation of what is already known. This limits the overall magnitude of the advances presented in these figures. Overall, the stated concerns limit the impact of the manuscript in advancing our understanding of heart contractility.

      We thank the reviewer for the positive comments and appreciate the reviewer for the instructive suggestions. Birse 2010 (PMID: 21035763) was cited in our manuscript. Liao 2021 showed that Akh mRNA levels are changed with HFD. We added the reference to the revised manuscript and modified the text as: “In consistent with a previous work (Liao et al., 2020), we showed that the expression of Akh was significantly up-regulated in the flies fed a HFD, compared to NFD-fed flies (Figure 2B)”. Our qPCR verified Liao’s results. On top of this, we investigated the calcium levels in the Akh producing cells (APCs) and showed elevated calcium levels in the APC in HFD fed flies. In the revised version, we added more data to show that Akh protein levels were increased with HFD (Figure 2E-F). In line with Noyes' discovery, which showed that Akh injection caused cardioaccelation in prepupae, we showed that genetic manipulation of Akh expression affected heartbeat in the adults.   

      Reviewer #3 (Public Review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' augments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      It is intriguing to see an increase in Akh mRNA levels in HFD-fed animals. This is a key result for linking HFD-induced arrhythmia to Akh. Thus, demonstrating that HFD also increases the Akh protein levels and Akh is secreted more should significantly strengthen the manuscript.

      Thank you for the positive comments and the instructive suggestions. We performed immunostaining to show that Akh protein levels increased, which is consistent with elevated Akh mRNA expression in HFD-fed flies. The data was added to Figure 2, panels E and F. Akh secretion from the APCs is regulated by APC activity (https://doi.org/10.1038/s41586-019-1675-4). We used a calcium reporter CaLexA (https://doi.org/10.3109/01677063.2011.642910) to monitor APC activity and showed that HFD increased APC activity (Figure 2, C-D).

      The experiments employing an AkhR null allele nicely demonstrate its requirement for HFD-induced cardiac arrhythmia. Depletion of Akh in Akh-expressing cells recapitulates the consequence of AkhR knockout, supporting that both Akh and its receptor are required for HFD-induced cardiac arrhythmia. Given that RNAi is associated with off-target effects and some RNAi reagents do not work, testing multiple independent RNAi lines is the standard procedure. It is also important to show the on-target effect of the RNAi reagents used in the study.

      Indeed, RNAi approaches can suffer from off-target effects. For Akh experiments, we used an RNAi line BL_34960, which was generated using artificial microRNAs shRNA (DOI: 10.1038/nmeth.1592). In comparison to long-hairpin constructs, shRNA constructs are expected to be advantageous, e.g., more efficient and minimized off-target. We performed immunostaining to determine Akh-Gal4>UAS-Akh-RNAi efficiency. We showed that anti-Akh fluorescence diminished in Akh-Gal4>UAS-Akh-RNAi APCs. The data was added to Figure 3-figure supplement 1.

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhR-expressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. The experiments presented in Figure 6 cannot justify the authors' conclusion. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutants could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs will allow for specific manipulation of ACNs, which is crucial for studying the specific role of ACNs in controlling cardiac rhythms.

      Thank you for the insightful comments. We have been trying to find a way to only target the AkhR neurons using split-Gal4. Up to now, it’s not successful. Akh/AkhR signaling shall play a key role in the ACNs, however, we cannot rule out the possibility that ACNs also receive signals other than Akh in the modulation of heartbeat.

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UAS-rpr.

      We added more data to show that AkhR+ neurons are positive in anti-Akh staining, indicating the AkhR+ neurons indeed receive Akh.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Typo in line 765: "increased Akh section into the circulation." Section should be secretion.

      Thank you for finding the typo. We changed section to secretion.

      Reviewer #2 (Recommendations For The Authors):

      One interesting extension to our knowledge in Figures 3 & 4 is that loss of Akhr and loss of Akh both block the cardiac contractility defects that accompany a HFD. The main concern I have with the Akh finding is that the authors use only a GAL4 control and no UAS alone control. Metabolic phenotypes often show strain-specific effects, so to make conclusions it is essential that the authors include a UAS alone control alongside the other genotypes to be sure it does not rescue the cardiac contractility defects that accompany a HFD by itself.

      I am interested in the authors' identification of a pair of Akhr-positive neurons that innervate the cardiac muscle. I am not aware of any other studies identifying these neurons, or revealing their function. The contents of Figure 5 therefore represent the largest advance in the study. However, the characterization of these neurons is very superficial, and a lot more work to understand their regulation and function in a HFD context is needed to make conclusions about their role in any HFD-induced cardiac contractility problems. Or to determine how Akh influences the function of these specific neurons in an HFD context.

      The reason I say this is that the authors ablate all Akhr-positive cells in Figure 6 and show that this disturbs normal cardiac contractility. While studies on the one pair of Akhr-positive neurons would be really interesting, ablating all Akhr-positive cells, which includes the fat and many other cell types in the fly, is not a scientifically rigorous approach to answering this question. As a result, the authors are only able to make the claim that ablating many cell types throughout the animal disrupts cardiac contractility, which does not advance our understanding of mechanisms underlying heart contractility problems. In addition, because the experiments they designed did not test whether it was Akh binding to Akhr on those neurons that regulate cardiac contractility problems in a HFD context, their experiments do not support their model in Figure 7.

      The authors also make conclusions that are fairly speculative around Line 231 when describing their model in Figure 7. These claims are simply not supported by the data they present and must be removed. For example, the authors have not identified an endocrine-heart axis, they simply showed that changes in Akh can influence the heart, but this is not necessarily a direct effect on a specific cell type. They do not show data that Akh binds the newly identified Akhr-positive neuron pair to mediate the effects of HFD-induced contractility defects - they just ablate all Akhr-positive cells (fat, neurons, and other types) and show cardiac defects. If those neurons did mediate the abnormal cardiac rhythm promoted by Akh, then ablating those neurons (and not a large number of additional tissues) should rescue HFD-induced heart defects just like reducing Akhr or Akh did (but this is the opposite of what they see). Overall, concerns with experimental design, data interpretation, and relatively few findings that aren't reported elsewhere reduce the impact of this paper.

      We appreciate the positive comments and helpful suggestions. Indeed, it is important to get clean genetic access to the cardiac neurons. We intended to use split Gal4 system to target the AkhR cardiac neurons. We have tried to build a split Gal4 driver AkhR-p65.AD. Two rounds of injection were carried out. However, we did not recover a transgenic line.

      In the revised version, we performed immunostaining using Akh antibodies to show that anti-Akh fluorescence was observed in AkhR neurons (Figure 5-figure supplement 1), indicating an endocrine-heart axis.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

    1. Author response:

      Our response aims to address the following:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we will incorporate them in the revised manuscript.

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We will clarify this in the discussion section of the revised manuscript as advised.

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

    1. Author response:

      The following is the authors’ response to the previous reviews

      In response to Reviewer #1, we have replaced the original images in Figure 1A with new immunofluorescence data showing matched DAPI staining density between control and AD patient samples. We also have updated the PINK1 staining images of mouse brain sections in Figure 1C to eliminate potential non-specific signals. These revisions provide clearer evidence supporting our conclusions about PINK1/pUb’s role in neurodegeneration.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism. The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.

      The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

      Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

      Recommendations for the authors:

      We thank the reviewers for their comments which have improved and strengthened our manuscript. Please see our responses to specific comments below in blue.

      Reviewer #1 (Recommendations for the authors):

      I have several small comments:

      (1) There has been recently a preprint from the Yao lab about the role of NR2F2 is steroidogenic cells (https://www.biorxiv.org/content/10.1101/2024.09.16.613312v1). They performed cKO of NR2F2 using the Wt1creERT2 and found similar results. You should present and discuss this paper in light of your results.

      Estermann et al., report a very similar phenotype of FLC hypoplasia in an independent mouse model of Nr2f2 conditional mutation. We have now referred to this article in the discussion of our manuscript as suggested.

      (2) In the introduction I think it is important to mention that the steroidogenic progenitors are derived from Wnt5a positive cells (https://pubmed.ncbi.nlm.nih.gov/35705036/).

      We have mentioned this point in the introduction as suggested.

      (3) In both models you show a decrease in the number of FLC (60% or 40%) and yet they both present with undescended testis. It is important to discuss the fact that there is no need for a complete ablation of testosterone and INSL3 in order to get cryptorchidism.

      We have mentioned this point in the discussion as suggested.

      The fact that you get only partial reduction in FLC is likely due to redundancy with additional factors, possibly the ARX like you stated in the discussion and it will be interesting to explore that in the future but is beyond the scope of the current paper.

      We agree with the reviewer, this question could be addressed by analyzing Arx,Nr2f2 double mutants.

      (4) In page 8 line 11 you mention data not shown- not sure if this is allowed in the journal .

      The data is now shown in Figure S5A as suggested.

      (5) In Figure 2- it will be good if you add a schematic model of the mouse strains used as well as the experimental and control mice next to the Tam scheme. Similar scheme should be in figure 3 for Nr5a1-cre.

      We have modified Figures 2 and 3 as suggested.

      (6) There is a clear and pronounced effect of the testis cords number and size. It will be good if you could qualify testis cord numbers/ diameter in the mutants even if you do not follow in detail the effect on Sertoli cells

      We have quantified testis cords numbers and area in E14.5 Control and Wt1<sup>CreERT2/+</sup>; Nr2f2<sup>flox/flox</sup> testes. This data is now shown in Figure S2M.

      (7) It will be good to present the undescended testis in the Wt1-cre model in figure 2 and not in the supp figure

      The data is now shown in Figure 2H-I as suggested.

      (8) Please add labelling of the testis, kidney, bladder, vas deferens in figure 3 N+O and in the Wt1-cre model

      We have added the labels in Figures 2 and 3 as suggested.

      (9) In figure 5 which present both models- it will be good to use the scheme I suggested before to highlight which results refer to which ko model.

      We have modified Figure 5 as suggested.

      Reviewer #2 (Recommendations for the authors):  

      The work presented in this manuscript gave me food for thought. I have always been intrigued by the fact that of the large number of interstitial cells in the testis, a minority differentiate into mature androgen-producing Leydig cells. In other words, how is the number of functional steroidogenic cells defined from a large pool of progenitor cells (ARX and NR2F2 positive ones)? This may have a link with the levels of androgens produced (a kind of feedback control) or the effectiveness of these androgens on the target tissues (i.e.: as spermatogenesis efficiency in adults). In addition, there must be specific signals (probably linked to gonadotropins) that induce the recruitment of Leydig cells from the progenitor pool. Perhaps the genetic models generated in this study could help to address these questions. I leave it to the authors to judge.

      We agree with the reviewer. How NR2F2 (and other factors) integrate extrinsic cues to regulate the recruitment of a subset of interstitial steroidogenic progenitors along the Leydig cell differentiation pathway is a fascinating question beyond the scope of this work.

      In addition to this reflection, I propose a few minor modifications likely to improve the quality of the manuscript:

      (1) Page 3, lane 3: I suggest to replace "growth" by "differentiation"

      We have modified the text as suggested.

      (2) Page 3, lane 4: the "scrotum" is missing in the parenthesis. Please add it before "and penis"

      We have modified the text as suggested.

      (3) Page 5, lanes 21-24: kidney hypoplasia is also evident on Fig S2H (stated in the figure legend). It could be also mentioned in this sentence and it implies "...that NR2F2 function is required for testicular and kidney development."

      We have modified the text as suggested.

      (4) Page 5, lanes 28-30. In addition to the reduction in the number of HSD3B-positive cells, HSD3B staining seems clearly more faint in mutant FLC (Fig 2M) compared to adrenal cells on the same section or FLC in control gonads. This fits well with other results on the level of steroidogenic enzymes (Fig 2O) and those presented thereafter (Fig S4 I-J and Fig 5). Perhaps the author could mention this fact.

      We have modified the text as suggested in the results section “NR2F2 is required for FLC maturation” (Page 8).

      (5) Page 5, lanes 31-34: testicular descent is hugely sensible to INSL3 in the mouse (by contrast with other species where androgens seem to be more critical). I was wondering if you can check a better phenotypic marker for the absence (or reduction) of androgens like the differentiation of epididymides by HE staining or the anogenital distance at birth.

      We have measured the anogenital distance at P0 and P1 as suggested and have included the corresponding graph in Fig. S3P

      (6) Page 8, lanes 21-22: "HSD3B positive FLC were smaller and more elongated". It is clear on Fig 5F but not evident on Fig 5D. Could the authors propose another image?

      We have modified Figure 5 as suggested and provide now another example of HSD3B positive FLCs in a Nr5a1Cre; Nr2f2<sup>flox/flox</sup> mutant gonad (Fig. 5D) and the corresponding control littermate (Fig. 5C).

      (7) Page 14, lane 12: "(arrow in I)" should be "(arrow in H)"

      We have modified the text as suggested. Please note that ACTA 2 expression is now shown in Figure S2 G-H.

      (8) Page 15, lane 6: "Arrows indicate NR5A1 positive FLC". There is no arrow on Fig4 C,D; but a kind of scale bar on the enlargement shown in C.

      We have modified Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302):

      “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].”

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Line 303-313):

      “In our simulation experiments, we assume the coexistence of the Pavlovian fear system and the instrumental system to demonstrate the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone, with higher punishment sensitivity, therefore we do not argue for the necessity for the Pavlovian fear system here. Instead, the Pavlovian fear system itself could be a potential biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies; the neural substrates for the Pavlovian fear system are well known (e.g., the limbic loop and amygdala, further see Supplementary Fig. 16). Additionally, Pavlovian fear system provides a separate punishment memory that cannot be erased by greater rewards like [Elfwing and Seymour, 2017, Wang et al., 2018]. This fundamental point can be observed in our simple T-maze simulations, where the Pavlovian fear system encourages avoidance behaviour and the agent chooses the smaller reward instead of the greater reward.”

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302 onwards) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      Thanks to the reviewer’s comments, we have now mentioned this point in Lines 299-302.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We hope our additions to the Discussion section, from Line 290 to Line 313 address the reviewer’s concerns.  

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We have now added a line discussing this. (Line 356-358)

      “Future work could also use a formal account of uncertainty which could fit the fear-conditioned skin-conductance response better than Pearce-Hall associability [Tzovara et al., 2018].”

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      Thank you, we have added further explanations in the discussion section. We have further improved the writing in abstract, introduction and Methods section taking into account recommendations from reviewer #2 and #3.

      Reviewer #2 (Recommendations for the authors): 

      (1) Why is there no flexible omega in Figures 3B and 3C? Did I miss this? 

      Thank you. We have now added additional text to explain our motivation in Experiment 2, which only varies the fixed omega and omits the flexible omega (Lines 136-140).

      “In this set of results, we wish to qualitatively tease apart the role of a Pavlovian bias in shaping and sculpting the instrumental value and also provide more insight into the resulting safety-efficiency trade-off. Having shown the benefits of a flexible ω in the previous section, here we only vary the fixed ω to illustrate the effect of a constant bias and are not concerned with the flexible bias in this experiment.”

      We encourage the reader to consider this akin to an additional study that will explain how Pavlovian bias to withdraw can play a role in avoiding punishments similar to that of punishment sensitivity. This is particularly important as we do have neural correlates for Pavlovian biases but lack a clear neural correlation for punishment sensitivity so far, as mentioned in our new additions to the Discussion section (Lines 303-313).

      (2) The introduction of the flexible omega and the PAL agent in the results is a bit sudden. Some more details are needed to understand this during the first read of this passage. 

      We thank reviewer #2 for bringing this to our notice. We have attempted to refine our passage by including sentences like - 

      “The standard (rational) reinforcement learning system is modelled as the instrumental learning system. The additional Pavlovian fear system biases the withdrawal actions to aid in safe exploration, in line with our hypothesis.”

      “Both systems learn using a basic temporal difference updating rule (or in instances, its special case, the Rescorla-Wagner rule)”

      “We implement the flexible ω using Pearce-Hall associability (see equation 15 in Methods). The Pearce-Hall associability maintains a running average of absolute temporal difference errors (δ) as per equation 14. This acts as a crude but easy-to-compute metric for outcome uncertainty which gates the influence of the Pavlovian fear system, in line with our hypothesis. This implies that higher the outcome uncertainty, as is the case in early exploration, the more cautious our agent will be, resulting in safer exploration”

      (3) In my view, the possibility of modeling moving predators is extremely interesting. I would include Figure 8D and the corresponding explanation in the main text. 

      Response with revision: We thank the reviewer for finding our simulation on moving predators extremely interesting. Unfortunately, since our instrumental system is not model-based, and especially is not explicitly modelling the predator dynamics, our simulation might not be a very accurate representation of real moving predator environments. As pointed out by Reviewer #1, perhaps several other systems other than Pavlovian fear responses are necessary for safe behaviour in such environments and we hope to address these in future studies. Thanks again for taking an interest in our simulations.

      (4) The VR experiment should be mentioned more clearly in the abstract and the introduction. It should be mentioned a bit more clearly why VR was helpful and why the authors did not use a simple bird's eye grid world task. 

      I cannot assess the RLDDM and I did not check the code. 

      Thank you, we have now mentioned the VR experiment more clearly in the abstract and the introduction. We also now further mention that the VR experiment “builds upon previous Go-No Go studies studying Pavlovian-Instrumental transfer (Guitart-Masip et al, 2012; Cavanagh et al, 2013). The virtual-reality approach confers a greater ecological validity and the immersive nature may contribute better fear conditioning, making it easier to distinguish the aversive components.”

      A bird’s eye grid world may not invoke a strong withdrawal response, as seen in these immersive approach-withdrawal tasks where we can clearly distinguish a Pavlovian fear-based withdrawal response. We did include immersive VR maze results in the supplementary materials, but future work is needed to isolate the different systems at play in such a complex behaviour.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      Thank you, we have now attempted to clarify these points in the Discussion section by adding the following text (Lines 313-321):

      “ We next discuss the plausibility of pre-training to select the hardwired actions In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesised to function as a Pavlovian fear/threat learning system [Menegas et al., 2018].”

      Reviewer #3 (Recommendations for the authors): 

      I have relatively little to suggest, as in my view the paper is robust, thorough, and creative, and does enough to support the primary argument being made at the most fundamental level. My suggestions for improvement are as follows: 

      (1) Some aspects of the model are potentially unrealistic (as described in the public review), and the paper may benefit from some discussion of these issues or attempts to make the model more realistic - i.e., to what extent is this plausible in explaining more complex avoidance behaviour? Primarily, the fact that pre-training is required to identify actions subject to Pavlovian bias seems unlikely to be effective in real-world situations - is there a better way to achieve this in cases where there isn't necessarily an instinctual Pavlovian response? 

      Thank you, we agree that the advantage of Pavlovian bias is restricted to the bias/instinctual Pavlovian response conferred by evolution. Future work is needed to model more complex avoidance behaviour such as escapes. We hope to have made this more clear with our edits to the Discussion (Lines 299-302) in our response to Reviewer #1’s comments, specifically:

      “The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020]”  

      (2) The description of the model in the method can be a little hard to follow and would benefit from further explanation of certain parameters. In general, it would be good to ensure that all terms mentioned in equations are described clearly in the text (for example, in Equation1 it isn't clear what k refers to). 

      Thank you, we have now added further information on all of the parameters in Equation 1 and overall improved the Methods section writing, for instance using time subscript for less confusion while introducing the parameters. We use the standard notation used in Sutton and Barto textbook. k refers to the timesteps into the future, and is now explained better in the Methods section.

      (3) Another point of clarification in Equation 1 - does the policy account for the Pavlovian influence or is this purely instrumental? 

      Thank you, Equation 1 is purely instrumental. We have now specifically mentioned this. The Pavlovian influence follows later. They are combined into propensities for action as per equations 11-13.

      (4) I was curious whether similar outcomes could be achieved by more complex instrumental models without the need for Pavlovian influences. For example, could different risk-sensitive decision rules (e.g., conditional value at risk) that rely only on the instrumental system afford safe behaviour without the need for an additional Pavlovian system? 

      Thank you for your comment. Yes, CVaR can achieve safe exploration/cautious behaviour in choices similar to Pavlovian avoidance learning. But we think both differ in the following ways:

      (1) CVaR provides the correct solution to the wrong problem (objective that only maximises the lower tail of the distribution of outcomes)

      (2) Pavlovian bias provides the wrong solution to the right problem (normative objective, but a Pavlovian bias which may be vestige of evolution)

      Here we use the “wrong problem, wrong solution, wrong environment” categorisation terminology from Huys et al. 2015.

      Huys, Q. J., Guitart-Masip, M., Dolan, R. J., & Dayan, P. (2015). Decision-theoretic psychiatry. Clinical Psychological Science, 3(3), 400-421.

      Secondly, we find an effect of Pavlovian bias on reaction times - slowing down of approach responses and faster withdrawal responses. We do not think this can be best explained in a CVaR type model and is a direction for future work. We think such model-based methods are slower to compute, but Pavlovian withdrawal bias is quicker response.

      We have now included this in brief in Lines 280-288.

      (5) Figure 5 would benefit from a clearer caption as it is not necessarily clear from the current one that the left panels refer to choices and the right panels to reaction times. 

      Thank you, we have improved the caption for Fig. 5.

      (6) It would be good to include some indication of the quality of the model fits for the human behavioural study (i.e., diagnostics such as R-hat) to ensure that differences in model fit between models are not due to convergence issues with different models. This would be especially helpful for the RLDDM models as these can be difficult to fit successfully.

      Thank you, we observed that all Rhat values were strictly less than 1.05 (most parameters were less than 1.01 and generally close to 1), indicating that the models converged. We have now added this line to the results (Line 246-248). Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302): “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].” In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      As suggested reviewer 1, we tested whether overexpression of Mad could rescue or mitigate the loss of sakura phenotypic defects, by using nos-Gal4-VP16 > UASp-Mad-GFP in the background of sakura<sup>null</sup>. As shown in Fig S11, we did not observe any mitigation of defects.

      Then, we also tested whether expressing a constitutive active form of Tkv, by using UAS-Dcr2, NGT-Gal4 > UASp-tkv.Q235D in the background of sakura<sup>RNAi</sup>. As shown in Fig S12, we did not observe any mitigation of defects by this approach either.

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      We examined Bam and DE-Cadherin expression in later RNAi knockdowns driven by ToskGal4. As shown in Fig S9, Bam was not expressed in these later knockdowns compared with controls. DE-Cadherin staining suggested a disorganized structure in late-stage egg chambers.

      We agree that we overstated a role of Sakura in regulating Orb in the initial manuscript. We changed the text to avoid overstating.

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

      We changed the text to avoid overstating a role of Sakura in regulating Orb localization.

      Based on our newly added results showing that transgenic overexpression of Mad could not rescue or mitigate the phenotypic defects of sakura<sup>null</sup> mutant (Fig S11), we do not think the reason for less pMad is less translation of Mad.

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General Comments:

      (1) The gene nomenclature: As mentioned in the text, Sakura means cherry blossom and is one of the national flowers of Japan. I am not sure whether the phenotype of the CG14545 mutant is related to Sakura or not. I would like to suggest the authors reconsider the naming.

      The striking phenotype of sakura mutant­ is tumorous and germless ovarioles. The tumorous phenotype, exhibiting lots of round fusome in germarium visualized by anti-Hts staining, looks like cherry blossom blooming to us. Also, the germless phenotype reminds us falling of the cherry blossom, especially considering that the ratio of tumorous phenotype decreases and that of germless decreases over fly age. Furthermore, “Sakura” symbolizes birth and renewal in Japanese culture (the last author of this manuscript is Japanese). Our findings indicated that the gene sakura is involved in regulation of renewal and differentiation of GSCs (which leads to birth). These are the reasons for the naming, which we would like to keep.

      (2) In many of the microscopic photographs in the figures, especially for the merged confocal images, the resolution looks low, and the images appear blurred, making it difficult to judge the authors' claims. Also, the Alpha Fold structure in Figure 10A requires higher contrast images. The magnification of the images is often inadequate (e.g. Figures 3A, 3B, 5E, 7A, etc). The authors should take high-magnification images separately for the germarium and several different stages of the egg chambers and lay out the figures.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      Specific Comments

      (1) How Sakura can cooperate with Otu remains unanswered. Sakura does not regulate deubiquitinase activity in vitro. Both sakura and otu appear to be involved in the Dpp-Smad signaling pathway and in the spatial control of Bam expression in the germarium, whereas Otu has been reported to act in concert with Bam to deubiquitinate and stabilize Cyc A for proper cystoblast differentiation. Therefore, it is plausible that the stabilization of Cyc A in the Sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. The authors may need to provide much deeper insight into the mechanism by which Sakura plays roles in these seemingly separable steps to orchestrate germ cell maintenance and differentiation during early oogenesis.

      Yes, it is possible that the stabilization of CycA in the sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. To test the significance and role of the Sakura-Otu interaction, we have attempted to identify Sakura point mutants that lose interaction with Otu. If such point mutants were successfully obtained, we were planning to test if their transgene expression could rescue the phenotypes of sakura mutant as the wild-type transgene did. However, after designing and testing the interaction of over 30 point mutants with Otu, we could not obtain such mutant version of Sakura yet. We will continue making efforts, but it is beyond the scope of the current study. We hope to address this important point in future studies.

      (2) Figure 3A and Figure 4: The authors show that piRNA production is abolished in Sakura KO ovaries. It is known that piRNA amplification (the ping-pong cycle) occurs in the Vasa-positive perinuclear nuage in nurse cells. Is the nuage normally formed in the absence of Sakura? The authors provide high-magnification images in the germarium expressing Vas-GFP. How does Sakura, and possibly Out, contribute to piRNA production? Are the defects a direct or indirect consequence of the loss of Sakura?

      We provided higher magnification images of germarium expressing Vasa-EGFP in sakura mutant background (Fig 3A and 3B). The nuage formation does not seem to be dysregulated in sakura mutant. Currently, we do not know if the piRNA defects are direct or indirect consequence of the loss of Sakura. This question cannot be answered easily. We hope to address this in future studies.

      (3) Figure 7 and Figure 12: The authors showed that Dpp-Smad signaling was abolished in Sakura KO germline cells. The same defects were also observed in otu mutant ovaries (Figure 12B). How does the Sakura-Otu axis contribute to the Dpp-Smad pathway in the germline?

      As we mentioned in the response to comment (1), we attempted to test the significance and role of the Sakura-Otu interaction, including in the Dpp-Smad pathway in the germline, but we have not yet been able to obtain loss-of-interaction mutant(s) of Sakura. We hope to address this in future studies.

      (4) Figure 9 and Fig 10: The authors raised antibodies against both Sakura and Otu, but their specificities were not provided. For Western blot data, the authors should provide whole gel images as source data files. Also, the authors argue that the Otu band they observed corresponds to the 98-kDa isoform (lines 302-304). The molecular weight on the Western blot alone would be insufficient to support this argument.

      When we submitted the initial manuscript, we also submitted original, uncropped, and unmodified whole Western blot images for all gel images to the eLife journal, as requested. We did the same for this revised submission. I believe eLife makes all those files available for downloading to readers.

      In the newly added Fig S13B, we used very young 2-5 hours ovaries and 3-7 days ovaries. 2-5 days ovaries contain only mostly pre-differentiated germ cells. Older ovaries (3-7 days in our case here) contain all 14 stages of oogenesis and later stages predominate in whole ovary lysates.

      As reported in previous literature (Sass et al. 1995), we detected a higher abundance of the 104 kDa Otu isoform than the 98 kDa isoform in from 2-5 hours ovaries and predominantly the 98 kDa isoform in 3-7 days ovaries (Fig S13B). These results confirmed that the major Otu isoform we detected in Western blot, all of which uses old ovaries except for the 2-5 hours ovaries in Fig S13B, is the 98 kDa isoform.

      (5) Otu has been reported to regulate ovo and Sxl in the female germline. Is Sakura involved in their regulation?

      We examined sxl alternative splicing pattern in sakura mutant ovaries. As shown in Fig S6, we detected the male-specific isoform of sxl RNA and a reduced level of the female-specific sxl isoform in sakura mutant ovaries. Thus Sakura seems to be involved in sxl splicing in the female germline, while further studies will be needed to understand whether Sakura has a direct or indirect role here.

      (6) Lines 443-447: The GSC loss phenotype in piwi mutant ovaries is thought to occur in a somatic cell-autonomous manner: both piwi-mutant germline clones and germline-specific piwi knockdown do not show the GSC-loss phenotype. In contrast, the authors provide compelling evidence that Sakura functions in the germline. Therefore, the Piwi-mediated GSC maintenance pathway is likely to be independent of the Sakura-Otu axis.

      We changed the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a cleanly written manuscript, with some sentences/sections that are confusing the way they are constructed (i.e. Line 37-38, 334, section on Flp/FRT experiments).

      We rewrote those sections to avoid confusion.

      Comment for all merged image data: the quality of the merged images is very poor - the individual channels are better but should also be reprocessed for more resolved image data sets. Also, it would be helpful to have boundaries drawn in an individual panel to identify the regions of the germarium, as cartooned in Figure S1A (which should be brought into Figure 1) F-actin or Vsg staining would have helped throughout the manuscript to enhance the visualization of described phenotypes.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      We outlined the germarium in Fig 1E.

      We brought the former FigS1 into Fig 1A.

      We provided Phalloidin (F-Actin) staining images in Fig S7.

      All p-values seem off. I recommend running the data through the student t-test again.

      We used the student t-test to calculate p-values and confirmed that they are correct. We don’t understand why the reviewer thinks all p-values seem off.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      Figure 1

      (1) Within the text, C is mentioned before A.

      We updated the text and now we mentioned Fig 1A before Fig 1C.

      (2) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      (3) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      (4) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakura<sup>null</sup> phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      (5) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      (6) Line 95 "as shown below" is not clear that it's referencing panel D.

      We now referenced Fig 1D.

      (7) Re: Figures 1 E and F. There is no mention of Hts or Vasa proteins in the text.<br /> "Sakura-EGFP was not expressed in somatic cells such as terminal filament, cap cells, escort cells, or follicle cells (Figure 1E). In the egg chamber, Sakura-EGFP was detected in the cytoplasm of nurse cells and was enriched in developing oocytes (Figure 1F)". Outline these areas or label these structures/sites in the images. The color of Merge labels is confusing as the blue is not easily seen.

      We mentioned Hts and Vasa in the text. We labeled the structures/sites in the images and updated the color labeling.

      Figure 2

      (1) Entire figure is not essential to be a main figure, but rather supplemental.

      We don’t agree with the reviewer. We think that the female fertility assay data, where sakura null mutant exhibits strikingly strong phenotype, which was completely rescued by our Sakura-EGFP transgene, is very important data and we would like to present them in a main figure.

      (2) 2A- one star (*) significance does not seem correct for the presented values between 0 and 100+.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      (3) 2C images are extremely low quality. Should be presented as bigger panels.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images. We also presented as bigger panels.

      Figure 3

      (1) "We observed that some sakura<sup>null</sup> /null ovarioles were devoid of germ cells ("germless"), while others retained germ cells (Fig 3A)" What is described is, that it is hard to see. Must have a zoomed-in panel.

      We provided zoomed-in panels in Fig 3B

      (2) C - The control doesn't seem to match. Must zoom in.

      We provided matched control and also zoomed in.

      (3) For clarity, separate the tumorous and germless images.

      In the new image, only one tumorous and one germless ovarioles are shown with clear labeling and outline, for clarity.

      (4) Use arrows to help clearly indicate the changes that occur. As they are presented, they are difficult to see.

      We updated all the panels to enhance clarity.

      (5) Line 158 seems like a strong statement since it could be indirect.

      We softened the statement.

      Figure 4

      (1) Line 188-189 - Conclusion is an overstatement.

      We softened the statement.

      (2) Is the piRNA reduction due to a change in transcription? Or a direct effect by Sakura?

      We do not know the answers to these questions. We hope to address these in future studies.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer’s point. We think using numbers, not %, makes more sense.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      (3) In Line 218 there's an extra parenthesis after the PGC acronym.

      We corrected the error.

      (4) TOsk-Gal4 fly is not in the Methods section.

      We mentioned TOsk-Gal4 in the Methods.

      Figure 6:

      (1) The FLP-FRT section must be rewritten.

      We rewrote the FLP-FRT section.

      (2) A - include statistics.

      We included statistics using the chi-square test.

      (3) B - is not recalled in the Results text.

      We referred Fig 6B in the text.

      (4) Line 232 references Figure 3, but not a specific panel.

      We referred Fig 3A, 3C, 3D, and 3E, in the text.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      (1) There should be CycA expression in the control during the first 4 divisions.

      Yes, there is CycA expression observed in the control during the first 4 divisions, while it’s much weaker than in sakura<sup>null</sup> clone.

      (2) Helpful to add the dotted lines to delineate (A) as well.

      We added a dotted outline for germarium in Fig 7A.

      (3) Line 263 CycA is miswritten as CyA.

      We corrected the typo.

      Figure 9

      (1) Otu antibody control?

      We validated Otu antibody in newly added Fig 10C and Fig S13A.

      (2) Which Sakura-EGFP line was used? sakura het. or null background? This isn't mentioned in the text, nor legend.

      We used Sakura-EGFP in the background of sakura[+/+]. We added this information in the methods and figure legend.

      (3) C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti-Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      Figure 10

      (1) A- The resolution of images of the ribbon protein structure is poor.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      (2) A table summarizing the interactions between domains would help bring clarity to the data presented.

      We added a table summarizing the fragment interaction results.

      (3) Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer’s points. In our study, even for the full-length proteins.

      We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      Figure 12

      (1) A - control and RNAi lines do not match.

      We provided matched images.

      (2) In general, since for Sakura, only its binding to Otu was identified and since they phenocopy each other, doesn't most of the characterization of Sakura just look at Otu phenotypes? Does Sakura knockdown affect Otu localization or expression level (and vice versa)?

      We tested this by Western (Fig S15) and IF (Fig 12). Sakura knockdown did not decrease Otu protein level, and Otu knockdown did not decrease Sakura protein level (Fig S15). In sakura<sup>null</sup> clone, Otu level was not notably affected (Fig 12). In sakura<sup>null</sup> clone, Otu lost its localization to the posterior position within egg chambers.

      Figure S6

      (1) It is Luciferase, not Lucifarase.

      We corrected the typo.

      Reviewer #3 (Recommendations for the authors):

      (1) It is interesting that germless and tumorous phenotypes coexist in the same population of flies. Additional consideration of these essentially opposite phenotypes would significantly strengthen the study. For example, do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age? The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype? Is transposon expression involved in either phenotype? Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole? Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes? What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts? It may not be necessary to answer all of these questions, but more insight into how these two phenotypes can be caused by loss of sakura would be helpful.

      We performed new experiments to answer these questions.

      do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age?

      Tumorous and germless ovarioles coexist in the same fly (in the same ovary). Tumorous ovarioles are present in very young (0-1 day old) flies, including newly eclosed (Fig S5). The ratio of germless ovarioles increases and that of tumorous ovarioles decreases with age (Fig S5).

      The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype?

      bam knockdown effect on tumorous phenotype is shown in Fig S10. bam knockdown increased the ratio of tumorous ovarioles and the number of GSC-like cells.

      Is transposon expression involved in either phenotype?

      Since our transposon-piRNA reporter uses germline-specific nos promoter, it is expressed only in germ line cells, so we cannot examine in germless ovarioles.

      Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole?

      Yes, Sakura mutant GSC clones overgrow. Please compare Fig 6C and Fig S8.

      Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes?

      Fig S10 and Fig S12 show the ovariole phenotypes of sakura RNAi driven by NGT-Gal4. It causes both germless and tumorous phenotypes.

      What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts?

      Our mosaic clone was induced at the adult stage, so we already have data of adulthood-specific loss of function. Gal80ts does not work well with nos-Gal4.

      (2) The idea that the excessive bam expression in tumorous ovaries is due to a failure of bam repression by dpp signaling is not well-supported by the data. Dpp signaling is activated in a very narrow region immediately adjacent to the niche but the images in Figure 7A show bam expression in cells that are very far away from the niche. Thus, it seems more likely to be due to a failure to turn bam expression off at the 16-cell stage than to a failure to keep it off in the niche region. To determine whether bam repression in the niche region is impaired, it would be important to examine cells adjacent to the niche directly at a higher magnification than is shown in Figure 7A.

      We provided higher magnification images of cells adjacent to the niche in new Fig 7A.

      We found that cells adjacent to the niche also express Bam-GFP.

      That said, we agree with the reviewer. A failure to turn bam expression off at the 16-cell stage may be an additional or even a main cause of bam misexpression in sakura mutant. We added this in the Discussion.

      (3) In addition, several minor comments should be addressed:

      a. Does anti-Sakura work for immunofluorescence?

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies.

      b. Please provide insets to show the phenotypes indicated by the different color stars in Figure 3C more clearly.

      We provided new, higher-magnification images to show the phenotypes more clearly.

      c. Please indicate the frequency of the expression patterns shown in Figure 4D (do all ovarioles in each genotype show those patterns or is there variable penetrance?).

      We indicated the frequency.

      d. An image showing TOskGal4 driving a fluorophore should be provided so that readers can see which cells express Gal4 with this driver combination.

      It has been already done in the paper ElMaghraby et al, GENETICS, 2022, 220(1), iyab179, so we did not repeat the same experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mallimadugula et al. combined Molecular Dynamics (MD) simulations, thiol-labeling experiments, and RNA-binding assays to study and compare the RNA-binding behavior of the Interferon Inhibitory Domain (IID) from Viral Protein 35 (VP35) of Zaire ebolavirus, Reston ebolavirus, and Marburg marburgvirus. Although the structures and sequences of these viruses are similar, the authors suggest that differences in RNA binding stem from variations in their intrinsic dynamics, particularly the opening of a cryptic pocket. More precisely, the dynamics of this pocket may influence whether the IID binds to RNA blunt ends or the RNA backbone.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Strengths:

      The use of extensive Adaptive Sampling combined with biochemical assays clearly points to the opening of the Interferon Inhibitory Domain (IID) as a factor for RNA binding. This type of approach is especially useful to assess how protein dynamics can affect its function.

      Weaknesses:

      Although a connection between the cryptic pocket dynamics and RNA binding mode is proposed, the precise molecular mechanism linking pocket opening to RNA binding still remains unclear.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether a cryptic pocket in the VP35 protein of Zaire ebolavirus has a functional role in RNA binding and, by extension, in immune evasion. They sought to address whether this pocket could be an effective therapeutic target resistant to evolutionary evasion by studying its role in dsRNA binding among different filovirus VP35 homologs. Through simulations and experiments, they demonstrated that cryptic pocket dynamics modulate the RNA binding modes, directly influencing how VP35 variants block RIG-I and MDA5-mediated immune responses.

      The authors successfully achieved their aim, showing that the cryptic pocket is not a random structural feature but rather an allosteric regulator of dsRNA binding. Their results not only explain functional differences in VP35 homologs despite their structural similarity but also suggest that targeting this cryptic pocket may offer a viable strategy for drug development with reduced risk of resistance.

      This work represents a significant advance in the field of viral immunoevasion and therapeutic targeting of traditionally "undruggable" protein features. By demonstrating the functional relevance of cryptic pockets, the study challenges long-standing assumptions and provides a compelling basis for exploring new drug discovery strategies targeting these previously overlooked regions.

      Strengths:

      The combination of molecular simulations and experimental approaches is a major strength, enabling the authors to connect structural dynamics with functional outcomes. The use of homologous VP35 proteins from different filoviruses strengthens the study's generality, and the incorporation of point mutations adds mechanistic depth. Furthermore, the ability to reconcile functional differences that could not be explained by crystal structures alone highlights the utility of dynamic studies in uncovering hidden allosteric features.

      Weaknesses:

      While the methodology is robust, certain limitations should be acknowledged. For example, the study would benefit from a more detailed quantitative analysis of how specific mutations impact RNA binding and cryptic pocket dynamics, as this could provide greater mechanistic insight. This study would also benefit from providing a clear rationale for the selection of the amber03 force field and considering the inclusion of volume-based approaches for pocket analysis. Such revisions will strengthen the robustness and impact of the study.

      Reviewer #3 (Public review):

      Summary:

      The authors suggest a mechanism that explains the preference of viral protein 35 (VP35) homologs to bind the backbone of double-stranded RNA versus blunt ends. These preferences have a biological impact in terms of the ability of different viruses to escape the immune response of the host.

      The proposed mechanism involves the existence of a cryptic pocket, where VP35 binds the blunt ends of dsRNA when the cryptic pocket is closed and preferentially binds the RNA double-stranded backbone when the pocket is open.

      The authors performed MD simulation results, thiol labelling experiments, fluorescence polarization assays, as well as point mutations to support their hypothesis.

      Strengths:

      This is a genuinely interesting scientific question, which is approached through multiple complementary experiments as well as extensive MD simulations. Moreover, structural biology studies focused on RNA-protein interactions are particularly rare, highlighting the importance of further research in this area.

      Weaknesses:

      - Sequence similarity between Ebola-Zaire (94% similarity) explains their similar behaviour in simulations and experimental assays. Marburg instead is a more distant homolog (~80% similarity relative to Ebola/Zaire). This difference is sequence and structure can explain the propensities, without the need to involve the existence of a cryptic pocket.  

      - No real evidence for the presence of a cryptic pocket is presented, but rather a distance probability distribution between two residues obtained from extensive MD simulations. It would be interesting to characterise the modelled RNA-protein interface in more detail

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Before assessing the overall quality and significance of this work, this reviewer needs to specify the context of this review. This reviewer's expertise lies in biased and unbiased molecular dynamics simulations and structural biology. Hence, while this reviewer can overall understand the results for thiol-labeling and RNA-binding assays, this review will not assess the quality of these biochemical assays and will mainly focus on the modelling results.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Beyond the clear qualities of this work, I would like to mention a few points that may help to better contextualize and rationalize the results presented here.

      - First, both the introduction and discussion sections seem relatively condensed. Extending them to, for example, better describe the methodological context and discuss the methodological limitations and potential future developments related to biased simulations may help the reader get a better idea of the significance of this work.

      - The authors presented 3 homologs in this study: IIDs of Reston, Zaire, and Marburg viruses. While Zaire and Reston are relatively similar in terms of sequence (Figure S1). The sequences clearly differ between Marburg and the two other viruses. Can the author indicate a similarity/identity score for each sequence alignment and extend Figure S1 to really compare Marburg sequence with Reston and Zaire? Can they also discuss how these differences may impact the comparison of the three IIDs? This may also help the reader to understand why sometimes the authors compare the three viruses and why sometimes they are focusing only on comparing Zaire and Reston.

      We would like to thank the reviewer for raising this point and we agree that additional details about the sequence comparison provide more context for the choices of substitutions we made. Therefore, we have updated Fig S1 to include a detailed pairwise comparison of all the IID sequences including the percentage sequence similarity and identity. We have also added the following sentences to the results section where we first introduced the substitutions between Zaire and Reston IIDs

      “While the sequence of Marburg IID differs significantly from Reston and Zaire IIDs with a sequence identity of 42% and 45% respectively (Fig S1), the sequences of Reston and Zaire IID are 88% identical and 94% similar. Particularly, substitutions between these homologs are all distal to the RNA-binding interfaces and all the residues known to make contacts with dsRNA from structural studies are identical. Therefore, we reasoned that comparing these two homologs would help us identify minimal substitutions that control pocket opening probability and allow us to study its effect on dsRNA binding with minimal perturbation of other factors.”

      - In this work, the authors mentioned the cryptic pocket but only illustrated the opening of this pocket by using a simple distance between residues (Figure 2) and a SASA of one cysteine (Figure 3). In previous work done by the authors (Cruz et al. , Nature Communications, 2022), they better characterized residues involved in RNA binding and forming the cryptic pocket. Thus, would it be possible to better described this cryptic pocket (residues involved, volume, etc ..) and better explain how, structurally speaking, it can affect RNA binding mode (blunt ends vs backbone) ?

      We thank the reviewer for pointing out the need for clarification on the residues involved in RNA binding and pocket opening and the mechanism linking them. We have performed the CARDS analysis on Reston and Marburg IID simulations as we had done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section.

      - As a counter-example, the authors used C315 for SASA calculation and thiol labeling (Figure 3). This cysteine is mainly buried as seen by SASA for Reston and Marburg and thiol labelling (Figure 3 E,G,H). Would it be possible to also get thiol labeling rates for Cystein 264 in Reston and its equivalent to see a case where the residue is solvent exposed?

      We have shown the SASA for C264 from the simulations in Fig S4 and the thiol labeling rates for all 4 cysteines in Reston IID in Fig S6. Comparing these rates to the rates of all 4 cysteines obtained for Zaire IID (Fig 4 in Cruz et Al, 2022), we observe that the rates for C264, which is expected to be exposed are significantly faster than those of C315 which is largely buried in all variants.  

      - I strongly support here the will of the authors to share their data by depositing them in an OSF repository. These data help this reviewer to assess some of the results produced by the authors and help to better understand the dynamics of their respective systems. I have just a few comments that need to be addressed regarding these data: o While there are data for WT Reston and Marburg, there is no data for Zaire. Is this because these data correspond to the previous work (Cruz et al. 2022) (in this case, it would be good to make this clear in the main text) or is it an omission? o There is no center.xtc file in the Marburg-MSM directory o There is no protmasses.pdb in the Reston-MSM directory

      - In general, if possible, it would be good to use the same name for each type of file presented in each directory to help a potential user understand a bit more how to use these data.

      - If possible, adding a bit more of metadata and explanations on the OSF webpage would be very beneficial to help find these data. To help in this direction, the authors may have a look to the guidelines presented at the end of this article: https://elifesciences.org/articles/90061

      We thank the reviewer for pointing out the omissions from the OSF repository. We have added the missing files and followed a uniform naming convention. We have also added documentation in the metadata section of the OSF repository to help others use the data.  

      Indeed, the simulation data used for Zaire IID is available on the OSF repository corresponding to Cruz et al. 2022 at https://osf.io/5pg2a. We have also clarified this in the data availability section of the main text.  

      Minor point:

      In Figure 2, there is a slight bump for the 225-295 distance around 1 nm for Reston. Can the author comment it ? As these results are based on long AS, even if very small, do the authors think this population is significant?

      Comparing the probability distributions obtained from bootstrapping the frames used to calculate the MSM equilibrium probabilities (Revised Fig1), we observe that the bump for the Reston IID distribution is persistent in all bootstraps indicating that it might indeed be significant. This is also consistent with our observation that the cysteine 296 does get fully labeled in our thiol labeling experiments, albeit significantly slowly compared to the other homologs.  

      Reviewer #2 (Recommendations for the authors):

      I recommend that the authors implement moderate revisions prior to the publication of this research article, addressing the identified weaknesses (see below).

      The authors should provide a rationale for their selection of the amber03 force field (Duan et al., JCTC 24, 1999-2012, 2003) for molecular dynamics simulations, particularly given the availability of more recent and optimized versions of the AMBER force fields. These newer force fields may offer improved parameterization for biomolecular systems, potentially enhancing the accuracy and reliability of the simulation results.

      We chose the Amber03 force field because it has performed well in much of our past work, including the original prediction of the cryptic pocket that we study in this manuscript. The results presented in this manuscript also demonstrate the predictive power of Amber03.

      Additionally, while the authors utilized solvent-accessible surface area (SASA) for cryptic pocket analysis, volume-based approaches may be more suitable for this purpose. Several studies (e.g., Sztain et al. J. Chem. Inf. Model. 2021, 61, 7, 3495-3501) have demonstrated the utility of volume analysis in identifying and characterizing cryptic pockets. The authors could consider incorporating such methodologies to provide a more comprehensive assessment of pocket dynamics.

      The authors propose that the cryptic pocket is not merely a random structural feature but functions as an allosteric regulator of dsRNA binding. To further substantiate this claim, an in-depth analysis of this allosteric effect using for instance network analysis could significantly enhance the study. Such an approach could identify key residues and interaction networks within the protein that mediate the allosteric regulation. This type of mechanistic insight would not only provide a stronger theoretical framework but also offer valuable information for the rational design of therapeutic interventions targeting the cryptic pocket.  

      We thank the reviewer for pointing out the need for clarification on the molecular mechanism linking the opening of the cryptic pocket to RNA binding. We have performed the CARDS analysis on Reston and Marburg IID simulations as was done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section. Briefly, we do find a community (blue) comprising the pocket residues in Reston and Marburg IIDs as we did in Zaire. Similarly, we find that many of the RNA binding residues fall into the orange and green communities as in Zaire. However, there are differences in exactly which residues are clustered into which of these two communities. There are also differences in how strongly connected these communities are in the three homologs. Therefore, while we can conclude that pocket residues likely have varying influence on the RNA binding residues in the homologs, it is hard to say exactly what that variation is from this analysis alone.  

      Reviewer #3 (Recommendations for the authors):

      - MD simulations: All simulations were initialised from the 3 crystal structures, is it correct? In all cases, RNA ds was not included in simulations, right? Were crystallographic MG ions in the vicinity of the binding site included? these are known to influence structural dynamics to a large extent.

      All simulations were indeed initialized using only protein atoms from the crystal structures 3FKE, 4GHL, and 3L2A. Therefore, crystallographic Mg ions were not included in the simulations. However, we do agree with the reviewer and think that the effect of parameters such as salt concentration, specifically Mg ions which are known to be important for the stability of dsRNA, on the pocket opening equilibrium merits detailed study in future work.

      - Figure 2: Would it be possible to perform e.g. a block error analysis and show the statistical errors of the distributions?

      We agree that showing the statistical variation in the MSM equilibrium probabilities is important for comparing the different distributions. Therefore, we have updated Figs 2 and 5 to show the distributions obtained from MSMs constructed using 100 and 10 random samples of the data respectively to indicate the extent of the statistical variability in the MSM construction.  

      - More detailed structural biology experiments (such as NMR or HDX-MS) could potentially shed more light on the differential behaviour of the three different homologs, providing more evidence for the presence of the cryptic pocket.

      We agree that NMR and HDX-MS are powerful means to study dynamics and are actively exploring these approaches for our future work.

    1. Author response:

      Reviewer #1:

      We appreciate the Reviewer's positive feedback on the strengths of our study.

      The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place.

      We thank the Reviewer for this valuable feedback. We agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating hypotheses of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We will contextualize our statements about stable binders and limit our claims to stating that the protein-peptide complex is stable within 1 μs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 72 μs. Additionally, we will include a plot showing the distribution of groove width across all replicas.

      Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses, and we agree that we do not currently have data to comment on binding affinity. We will, therefore, remove all references to this term. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, we plan to use orthogonal methods, namely MM/PB(GB)SA calculations for calculating binding free energies from existing trajectories (as performed by https://doi.org/10.1021/acs.jcim.4c00975). We will add predictions of all the peptides using AlphaFold 3, to confirm the binding region.

      Reviewer #2:

      We thank the Reviewer for their positive feedback.

      Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We will add appropriate computational information in the main text.

      More quantitative analysis in addition to visual structures.

      We will add an uncertainty estimate for the HDX calculations using bootstrapping and include additional information on bond distances for Y161. We will also incorporate time-series data showing the distance of the peptide from the groove across all replicas.

      Reviewer #3:

      We appreciate the Reviewer's positive feedback on our work.

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer's thoughtful comment. As noted in our response to Reviewer 1, we plan to address the concern about sampling by applying orthogonal methods. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      Meteorin proteins were initially described as secreted neurotrophic factors. In this manuscript, Eggeler et al. demonstrate a novel role for Meteorins in establish left-right axis formation in the zebrafish embryo. The authors generated null mutations in each of the three zebrafish meteorin genes - metrn, metrnla, and metrnlab. Triple mutant embryos displayed phenotypes strongly associated with left-right defects such as heart looping and visceral organ placement, and disrupted expression of Nodal-responsive genes, as did single mutants for metrn and metrnla. The authors then go on to demonstrate that these defects in left-right asymmetry are likely to due to defects in Kupffer's Vesicle and the progenitor dorseal forerunner cells including impaired lumen formation and reduced fluid flow, reduced clustering among DFCs, impaired DFC migration, mislocalization of apical proteins ZO-1 and aPKC, and detachment of DFCs from the EVL. Notably, the authors found that expression of marker genes sox32 and sox17 were not affected, suggesting Meteorins are required for DFC/KV morphogenesis but not necessarily fate specification. Finally, the authors show genetic interaction between Meteorins and integrin receptors, which were previously implicated in left-right patterning. In a supplemental figure, the manuscript also presents data showing expression of meteorin genes around the chick Hensen's node, suggesting that the left-right patterning functions may be conserved among vertebrates.

      Strengths:

      Strengths of this study include the generation of a triple mutant line that targets all known zebrafish meteorin family members. The experiments presented in this study were rigorous, especially with respect to quantification and statistical analysis.

      Weaknesses:

      Although the authors convincingly demonstrate a role for Meteorins in zebrafish left-right patterning, data supporting a conserved role in other vertebrates is compelling but limited to one supplemental figure.

      We thank the reviewer for their thoughtful summary of our study and for highlighting the strengths of our work, including the generation of the triple mutant line and the rigor of our experimental design and quantitative analyses. We also appreciate the constructive feedback regarding the limited functional data supporting the conservation of Meteorin function in other vertebrates. We agree that this is an important aspect that could be further explored. While functional studies in additional species are beyond the current scope, we will consider such experiments in future work.

      We would like to highlight the phylogenetic analysis of Meteorin proteins we have already performed and included in the manuscript (Fig. S7D), which illustrates the evolutionary conservation of this protein family and supports the possibility of a conserved role in left-right patterning.

      Additionally, we have expanded the methods and discussion to include: (1) details on zebrafish viability in contrast to reported embryonic lethality in metrn mutant mice, (2) the background strains used in our study, (3) observed variability in DFC number and potential batch effects and (4) clarification of our 'convergence ratio' quantification approach.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors describe their study on the role of meteorins in establishing the left-right organizer. The left-right organizer is a transient organ in vertebrate embryos in which rotating cilia cause a fluid flow that breaks the left-right symmetry and coordinates lateralization of internal organs such as gut and heart. In zebrafish, the left-right organizer (also named Kupffer's vesicle) is formed by dorsal forerunner cells, but very little is known about how dorsal forerunner cells coalles and form this ciliated vesicle in the embryo. The authors mutated the three meteorin-coding genes in zebrafish and observed that mutations in each one of these causes laterality defects with the strongest defects observed in the triple mutant. Loss of meteorins affects nodal gene expression, which play essential roles in establishing organ laterality. Meteorins are widely expressed in developing embryos and expression in lateral plate mesoderm and dorsal forerunner cells was observed. The meteorin triple mutant embryos display defects in the migration and clustering of the dorsal forerunner cells impairing kupffer's vesicle formation and cilia rotation. Finally, the authors show that meteorins genetically interact with integrins.

      Strengths:

      - These authors went through the lengthy process of generating triple mutants affecting all three meteorin genes. This provides robust genetic evidence on the role of meteorins in establishing organ laterality and circumvented that interpretation of the results would be hard due to redundant functions of meteorins.

      - The use of life imaging on triple mutants is appreciated

      - High-quality imaging of dorsal forerunner to quantify cell migrations and its relation to Kupffer's vesicle formation.

      Weaknesses:

      - Lack of a model how meteorins regulate dorsal forerunner cell migration.

      - Only genetic data to suggest a link between meteorins and integrins

      - Besides its role in DFC migration, meteorins may also play a more direct role in regulating Nodal signaling, which is not addressed here.

      We appreciate the recognition of the strengths of our study, particularly the generation of the triple meteorin mutants and the use of high-resolution imaging to quantify DFC behavior and Kupffer’s vesicle formation—both of which were central to providing robust evidence for Meteorins' role in left-right patterning.

      We also value the reviewer’s comments on areas that need further exploration, including the need for a mechanistic model explaining how Meteorins regulate DFC migration, the genetic interaction with integrins, and the potential direct involvement of Meteorins in Nodal signaling.

      We agree that deeper mechanistic insights would strengthen the study. While our findings suggest that Meteorins influence DFC migration and clustering through integrin pathways, a detailed mechanistic dissection, particularly regarding the yet unidentified Meteorin receptor, lies beyond the current scope. However, we consider this a key aspect for future research and have discussed it further in the revised discussion section.

      In response to the reviewer’s suggestions, we have expanded the discussion to address the limitations of the current data linking Meteorins and integrins, including relevant citations to studies that implicate integrins in similar contexts. Additionally, we have added a more detailed discussion of the potential for Meteorins to directly influence Nodal signaling, and we cite a relevant study to support this possibility.

      Once again, we thank the reviewer for their insightful and constructive comments. These points raise important directions for future investigation that will further advance our understanding of Meteorin function in left-right axis formation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the Results section (p. 9), the authors state, "...a reduced ZO-1 enrichment at the apical junctions of triplMUT GFP-positive DFCs could be detected." However, in Fig. 4F-G, the areas of ZO-1 enrichment indicated by arrowheads appear quite far from the DFCs themselves, making it unclear if these ZO-1-enriched areas are apical DFC junctions (as stated in the text) or instead are part of the EVL. Is it possible to include an additional cell membrane marker or other landmarks? In addition, the differences in ZO-1 accumulation between mutants and WT appear relatively modest. Is it possible to provide quantification of this effect?

      We appreciate the reviewer’s request for additional stainings and further clarification and we would like to highlight the requested quantifications of ZO-1 accumulation, including statistical analysis, are already provided in Fig. S5E.

      In mouse, loss of Meteorin is embryonic lethal yet the zebrafish triple mutants are viable. Could the authors discuss this discrepancy?

      We have expanded the discussion to address this point, suggesting that species-specific differences in compensatory mechanisms may explain the observed differences in viability. We would like to reiterate that while one study has reported embryonic lethality in metrn mutant mice, this specific mouse line has not been further investigated in any recent publications. Additionally, in collaboration with the lab of Alain Chédotal, we generated independent metrn and metrnl mutant mouse lines, which did not exhibit the phenotype described in the previously mentioned study.

      It has been reported that TL and AB strains exhibit variable numbers of DFCs and thus laterality defects (Moreno-Ayala et al., 2021, Cell Reports 34(2):108606). Would it be possible for the authors to report background stains used in this study and those used to generate the meteorin knock-outs?

      We appreciate the comment highlighting the importance of specifying the background strains used in our study. We have now included this information in the methods section, detailing the zebrafish strains utilized throughout our experiments.

      For statistical analysis, would be possible for the authors to report the number of clutches examined to control for batch effects (especially given the wide variability in DFC numbers as noted above)?

      For further clarification, we have now included additional explanation on number of clutches in the methods section.

      In the Methods section (p. 19), the description of how the convergence ratio was computed was somewhat unclear. Could the authors provide a citation or include a diagram/schematic?

      We have revised the Methods section to provide a clearer definition of the convergence ratio and have included a schematic (Fig. 4D) to illustrate how it was calculated.

      Reviewer #2 (Recommendations for the authors):

      - Meteorins are widely expressed in the embryo. Can the authors comment on whether meteorin expression is required in the dorsal forerunner cells (DFCs) or in other cells? This could be addressed by knockdown experiments in DFCs as described by others (PMID: 15716348)

      We thank the reviewer for this important comment. In our study, we have shown that Meteorins are not required for the identity of DFCs, as several DFC-specific markers remain expressed in the respective cells within the meteorin mutant background (see Fig. S4).

      - In fig1d and 1e the authors use heterotaxy to describe visceral organ placement. The embryo shown in 1d seems to display situs inversus instead of heterotaxy, which is defined as discordance in organ position. The authors should clarify this.

      We agree with the reviewer and have revised the figures and figure legends to clarify the distinction between situs inversus and heterotaxy.

      - In Fig2 the authors show that nodal pathway genes are reduced, suggesting reduced Nodal signaling. How do they explain this as loss of cilia rotation generally leads to randomization of Nodal signaling but not a reduction in signaling.

      Following this suggestion we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study.

      - Reduced Nodal signaling in the LPM leads to organ laterality defects. Most anterior tissues like the heart are more sensitive to perturbation in Nodal signaling in the LPM compared to more posterior organs like gut (see also PMID: 25684355). Since in triple mutants the position of the heart is more affected than the position of the visceral organs this suggests that meteorins play an additional role in Nodal signaling in the LPM. As others have shown that meteorins regulate nodal activity (PMID: 24558432), the authors should address this further.

      As described above, we have now added a further discussion on the possibility that Meteorins could directly regulate Nodal signaling in addition to their role in DFC migration and have cited a relevant study. Further investigation into a possible direct role of Meteorins in Nodal signaling will be pursued in future work.

      - The term 'convergence ratio' is not clearly described and confusing as convergence is also used for the movement of LPM cells towards the midline.

      As noted in response to Reviewer #1, we have revised the Methods section and included a schematic in Fig. 4D to better explain this parameter.

      We are grateful for the thoughtful critiques from both reviewers, which have been very constructive and improved the clarity of our study. We believe that the revisions we have made address the concerns raised, and we look forward to your evaluation of our revised manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported. However, several aspects of the study require clarification and discussion:

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of this interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation. As we do not have yet experimental data to support this model, we have not included this model in the manuscript. Yet, we will update the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. The group of Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require us utilize some of the mice described in above reference, which is beyond the scope of this manuscript. We, however, will revise the discussion to elaborate on this potential mechanism.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, like the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed until we sacrificed the mice at P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing use to the reference showing the heterozygous mice show glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that the better understanding the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are unfortunately beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. Most of the criticisms raised by the reviewer will be easily addressed in the revised version of the manuscript. Yet, none of the critiques raised by the reviewer seems to directly impact the overall interpretation of the data.

      Reviewer #3 (Public Review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected before the two patients with the BICC1 p.Ser240Pro mutation passed away. To address this missing link, we have – as a first pass - generated HEK293T cells carrying the BICC1 p.Ser240Pro variant. While these admittingly are not kidney epithelial cells, they indeed show a reduced level of PC2 expression. These data are shown in the manuscript. We have not yet addressed how this relates to its crosstalk with miR-17.

      Conclusion:

      The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      (2) Role of BiCC1 in mir/PKD1/2 complex should be the next step in the quest for a modifiable therapeutic target.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement.

      We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.

      Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and movies are of excellent quality.

      The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      Weaknesses:

      There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei.

      We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding.

      Our responses to each are provided below. Necessary experiments are in progress.

      Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted.

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae.

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation.

      We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.

    1. Author Response:

      We sincerely thank the reviewers and the editorial team for their thoughtful and constructive evaluation of our manuscript. We are very pleased that both reviewers and the Reviewing Editor found the work to be compelling and of interest to the community studying membrane-associated condensates. Below we outline our planned revisions in response to the public reviews.

      Reviewer #1

      We appreciate Reviewer #1’s positive evaluation of the study’s significance and the utility of our theoretical framework.

      1. Understandably, the authors used one system to test their theory (ZO-1). However, to establish a theoretical framework, this is sufficient.

      Response: We acknowledge this limitation. While we agree that additional systems would strengthen the generality of our theory, we note that the focus of this work is to introduce and validate a theoretical framework. As the reviewer notes, this is sufficient for establishing the framework. Nonetheless, we are open to further collaborations or future studies to test the model with other systems.

      Reviewer #2

      We are grateful for Reviewer #2’s detailed comments and will address each of the points as follows:

      1. In the theoretical section, what has previously been known, compared to which equations are new, should be made more clear.

      Response: We will revise the theory section to clearly distinguish previously established formulations from novel contributions.

      1. Some assumptions in the model are made purely for convenience and without sufficient accompanying physical justification. E.g., the authors should justify, on physical grounds, why binding rate effects are/could be larger than the other fluxes.

      Response: We will expand the discussion to provide key physical justification, especially to explain why binding rate effects are/could be larger than the other fluxes.

      1. I feel that further mechanistic explanation as to why bulk phase separation widens the regime of surface phase separation is warranted.

      Response: We will elaborate on the mechanism underlying this coupling.

      1. The major advantage of the non-dilute theory as compared with a best parameterized dilute (or homogenous) theory requires further clarification/evidence with respect to capturing the experimental data.

      Response: We will clarify this comparison more explicitly and highlight how the non-dilute model captures key nonlinear behaviors and concentration-dependent adsorption phenomena that the dilute model fails to reproduce.

      1. Discrete (particle-based) molecular modelling could help to delineate the quantitative improvements that the non-dilute theory has over the previous state-of-the-art. Also, this could help test theoretical statements regarding the roles of bulk-phase separation, which were not explored experimentally.

      Response:  We appreciate the suggestion and agree that such modeling would be valuable. However, this is beyond the scope of the current study. We will add a discussion on how discrete simulations could be used to further test our theory in future work.

      1. Discussion of the caveats and limitations of the theory and modelling is missing from the text.

      Response:  We will add a paragraph outlining caveats and limitations of the modelling.

      We believe these changes will significantly improve the clarity and impact of our manuscript, and we thank the reviewers again for their valuable input.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive feedback. As the reviewers noted, dissecting the contributions of Gtr1/2 and Pib2 to TORC1 signaling across diverse nutrient states is a technically and conceptually challenging problem. Indeed, many of the issues raised—including the interpretation of non-canonical TORC1 readouts (e.g., Rps6, Par32), the influence of strain auxotrophy and media composition, and the limitations of phosphoproteomic analysis performed under a single growth condition—underscore the challenges of working with the TORC1 signaling system.

      In response to the reviewers’ comments, we have undertaken a broader and more systematic analysis of TORC1 regulation across defined nitrogen transitions, building directly on the signaling framework established in Figures 6 and 8 of this manuscript. This work, which includes expanded phosphoproteomic profiling and the use of refined genetic tools, supports and extends the key conclusions of Cecil et. al. Specifically, it reinforces the existence of a Pib2-dependent TORC1 output under nitrogen-limited conditions and further clarifies the physiological relevance of the intermediate TORC1 activity state. Due to the scope and depth of this expanded work, we are reporting those findings in a separate publication. Nonetheless, we view the data presented here as a key foundational step in establishing a non-redundant framework for Gtr1/2- and Pib2-dependent control of TORC1.

      We have therefore made minor changes to the manuscript to clarify our use of different growth media and to temper our conclusions where appropriate. These changes, together with the context of ongoing work, should reinforce the value of Cecil et. al. in advancing our understanding of TORC1 and nutrient signaling in eukaryotes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work Jeong and colleagues focus on exploring the role of the acyltransferase ZDHHC9 in myelinating OLs in particular in the palmitoylation of several myelin proteins. After confirming the specific enrichment of the Zdhhc9 transcript in mouse and human OLs, the authors examine the subcellular localization of the protein in vitro and observed that in comparison with other isoforms, ZDHHC9 localizes at OLs cell bodies and at discrete puncta in the processes. These observations (Figures 1 and 2) led the authors to hypothesize that ZDHHC9 plays an important role in myelination. No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3).

      However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      Maturation of OL in Zdhhc9 KO was examined by crossing Zdhhc9 KO with Pdgfra-CreER;R26- EGFP and following the newly EGFP-labelled OPCs following tamoxifen administration. No changes in the numbers of EGFP+ OL were detected. The authors concluded that the loss of ZDHHC9 does not alter oligodendrogenesis in either the young or mature CNS. The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice. However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How to reconcile this? 

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Reviewer #2 (Public Review):

      This study provides an in-depth exploration of the impact of X-linked ZDHHC9 gene mutations on cognitive deficits and epilepsy, with a particular focus on the expression and function of ZDHHC9 in myelin-forming oligodendrocytes (OLs). These findings offer crucial insights into understanding ZDHHC9-related X-linked intellectual disability (XLID) and shed light on the regulatory mechanisms of palmitoylation in myelination. The experimental design and analysis of results are convincing, providing a valuable reference for further research in this field. However, upon careful review, I believe the article still needs further improvement and supplementation in the following aspects:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      This is an important point but is technically challenging to address in vivo as it would likely require delivery of AAV to express ZDHHC9wt and XLID mutants specifically in OLs, preferably in the absence of endogenous ZDHHC9. We hope the reviewers would agree that this experiment is beyond the scope of the current study. However, we did compare the ability of ZDHHC9wt and XLID mutants to palmitoylate MBP, and to autopalmitoylate (sometimes used as a surrogate measure of PAT activity) in transfected heterologous cells. Although we recognize that this over-expression system is less physiological than a native OL, it has the benefit of being able to readily compare transfected wt vs mutant forms of ZDHHC9 with minimal contribution from endogenous ZDHHC9. Intriguingly, using this system, we found that autopalmitoylation activity of the XLID ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 (new panels 8E-G) to show these additional experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. anti-MBP splice form-specific antibodies that are compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool, we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

      In summary, it is recommended that the authors address the above issues through additional experiments and improved discussions to further strengthen the credibility and clinical relevance of the article.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      No gross changes were detected in OL development in Zdhhc9 KO mice and analyses from P28 Zdhhc9 KO mice crossed with Mobp-EGFP reporter mice did not show changes in EGFP+ OL differentiation (Figure 3). However, and given the observed subcellular localization of ZDHHC9 in OL processes (Figure 2) and the observation that the percentage of unmyelinated axons is increased in Zdhhc9 KO (Figure 6), ***early time points to examine the differentiated pools of OLs and their capacity to extend processes/contact axons need to be considered***.

      We appreciate this point, but due to the order in which experiments were performed, the ZDHHC9 KO mouse colony that we maintained after initial submission of this work contains homozygous MOBP-EGFP, but not the mT/mG transgene that would be most optimal for the proposed experiment. We hope the reviewer appreciates that it would take considerable time and effort regarding mouse breeding to cross out the MOBP and add back the mT/mG. We nonetheless appreciate the importance of the point raised and therefore examined an earlier developmental time point (P21, 3 weeks) to quantify OLs and NG2+ OPCs. In our updated Fig 3C1-C3, we use Mobp-EGFP mice to show that Zdhhc9 KO does not significantly affect the number of EGFP+ OLs at this time point in the cortex, corpus callosum and spinal cord. We also show that in corpus callosum, Zdhhc9 KO does not significantly affect the number of NG2+ OPCs at this earlier time point (Fig 3D, E). Furthermore, immunostaining to detect BCAS1, a marker of pre-mature OLs, also revealed no qualitative difference with ZDHHC9 loss at P21. We show representative images from these BCAS1 experiments in an updated Fig S3. While these new experiments do not address the morphology of OLs in Zdhhc9 KO, they do provide further evidence that deficits in myelination in young Zdhhc9 KO mice (Figure 6) are not likely due to gross differences in OPC or OL numbers during development.

      The authors observed defects in Zdhhc9 KO OL protrusions that they attributed to abnormal OL membrane expansion (Fig 4 and 5). Can they show evidence for this?

      This is an important point, and we appreciate the opportunity to explain the reasoning behind our initial statement more fully, while noting that other explanations are possible. Fig 5B (an Imaris-assisted reconstruction using the EGFP cell fill/morphology marker) highlights large spheroid-like distensions along OL processes. We reason that these spheroids are enclosed by the OL lipid membrane because if the membrane were ruptured, the EGFP signal would likely diffuse. This in turn suggests that the caliber of the OL process at the position of the spheroid is grossly abnormal i.e. the membrane has hyper-expanded. Given that OL membrane growth during myelination extends in two directions, i.e., spiral growth to the axonal surface and longitudinal growth along the axon, it is possible that spheroid-like structures are formed by uneven myelin growth. We recognize that we cannot yet conclude whether and how spheroid formation might be linked to the myelination deficit that we observe in Zdhhc9 KO mice.

      However, defining the subcellular mechanism for spheroid formation may provide further insights into this issue. We have therefore largely retained the original statement but have added the reasoning above to our revised Discussion.

      The authors report that Zdhhc9 KO primary and secondary branches in OL were longer, some contained spheroid-like swellings and the OL protrusion complexity was higher. However, these data is partially contradictory to what they show in OL differentiation experiments in vitro (Fig 7). There is also no evidence for increased membrane expansion in Zdhhc9 knockdown myelin forming cells in culture. How do they reconcile these different findings?

      We appreciate the reviewer’s interest in this issue. Several non-mutually exclusive factors could account for the differences in OL morphology in vitro versus in vivo caused by Zdhhc9 loss. First, morphology in vivo may well be influenced by the axons and/or other extrinsic components around each OL that are not present in our primary cultures. Second, OL growth in vivo is highly 3-dimensional, whereas growth in culture is largely 2-dimensional – it may be difficult to support formation of spheroids (by definition, a 3-dimensional structure) in the latter situation. Finally, Zdhhc9 is absent in vivo from the beginning of development until the time points examined, whereas in our cultured OL experiments, Zdhhc9 shRNA is virally delivered to OPC cultures at DIV2 and likely acutely affects Zdhhc9 expression predominantly in committed OLs (following the switch to differentiation medium at DIV3). These differences may also affect the ability of other PATs or, potentially, palmitoylation-independent subcellular processes, to compensate for Zdhhc9 loss. We have more fully explained these points in our revised Discussion. 

      Page 7: "The OL processes in this culture condition correspond to large lipid-rich membranous sheets that form spiral membrane expansion on axons in vivo (49)." At which stage are authors referring to? OL processes are extended in culture before membrane formation and this is not clear here. In a 3-days differentiation culture, most OLs have not yet formed a myelin sheath (eg., Figure 2 in Zuchero et al., 2015, Dev Cell).

      We appreciate the reviewer highlighting this point. We first note that our oligodendrocyte (OL) culture conditions differ from the immunopanning method used by Zuchero et al., 2015 (original reference (Emery and Dugas, 2013)), which may affect the time course and progression of OL process elaboration and/or myelin sheath formation. We further note that in our cultures most EGFP+ processes are also MBP+ at the time point examined (strictly 3 days plus 9 hours post-differentiation). It thus seems likely that these MBP+ structures largely correspond to the MBP+ wrapping sheaths that occur in vivo, so we have therefore retained our original statement but have added this further explanation.

      Minor: Figure 6 (Legend): Time points should be indicated throughout the panels.

      We have added this information as requested

      Reviewer 2 Recommendations for the Authors:

      (1) Regarding the subcellular localization experiment of ZDHHC9 mutants in OL, it is currently limited to in vitro cultured OL, lacking validation in vivo OL or myelin sheath. Additionally, it is necessary to investigate whether the abnormal subcellular localization of ZDHHC9 mutants affects their enzyme activity and palmitoylation modification of substrate proteins.

      We thank the reviewer for raising this point. New data in our revised Figure 8 compares autopalmitoylation (sometimes used as a surrogate measure of PAT activity) of ZDHHC9wt and XLID mutants, and their ability to palmitoylate MBP in transfected cells. Intriguingly, we found that autopalmitoylation activity of the ZDHHC9-P150S mutant does not differ significantly from that of ZDHHC9wt, and that this mutant is still capable of palmitoylating MBP. Moreover, the R96W mutant, while impaired in autopalmitoylation, still palmitoylated MBP approximately 50% as effectively as ZDHHC9wt in our cell-based assay. These findings suggest that ZDHHC9-P150S and, probably, ZDHHC9-R96W mutants might still be able to palmitoylate substrates in OLs if they were properly localized. This possibility in turn suggests that impaired subcellular targeting in addition to, or instead of, impaired catalytic activity, may be a key factor in certain cases of ZDHHC9-associated XLID. We have expanded our Figure 8 to show these new experiments and have summarized the conclusions above in our revised Discussion. We thank the reviewer for suggesting that we further investigate this issue.

      (2) The experimental period (P21+21 days) using genetic labeling to track the development of myelinating cells may not be long enough. It is recommended to extend the observation time and analyze at more time points to more comprehensively reflect the impact of Zdhhc9 KO.

      We appreciate this point from the reviewer but, regrettably, we did not maintain the PdgfraCreER; R26-EGFP; Zdhhc9 KO mouse line and hope the reviewer appreciates that it would take considerable time and effort to rederive this line and then perform the suggested extended time course experiments. However, we note for the reviewer that our preliminary studies did not reveal any effect of Zdhhc9 KO on the number of MOBP-EGFP+ OLs in 6-month-old mice (not shown), consistent with a model in which Zdhhc9 loss does not affect OPC-OL commitment per se.

      (3) The author speculates that Zdhhc9 may regulate myelination by affecting the membrane localization of specific myelin proteins, but lacks direct experimental evidence to support this. It is suggested to detect the expression and distribution of relevant proteins in the myelin of Zdhhc9 KO mice.

      We share the reviewer’s interest in this point but realized that it is more technically challenging to address than might be initially thought. The main protein we would implicate and seek to test is MBP, but we already found that there is no gross change in MBP distribution in vivo in Zdhhc9 KO mice (Fig 3A). However, an anti-MBP antibody recognizes all forms of MBP, not just the specific splice variants whose palmitoylation is affected by ZDHHC9 loss. Specifically assessing nanoscale distribution of these splice variants would require a way (e.g. am anti-MBP splice form-specific antibody that is compatible with immuno-EM) to distinguish these variants from other, non-palmitoylated forms of MBP. Although such an antibody could be an important tool we hope the reviewers would agree that developing and characterizing such a reagent is beyond the scope of the current study.

      We do, however, note that the lack of gross change in MBP distribution and levels in Zdhhc9 KO mice is consistent with the relatively mild phenotype of these mice, compared with shiverer (shi/shi) mice, in which MBP is completely lost. In shiverer, CNS compact myelin is almost absent (PMID: 671037; PMID: 88695; PMID: 460693) and, as the name suggests, mice display a shivering gait, and exhibit seizures and early death. In contrast, Zdhhc9 mice show only subtle behavioral deficits (PMID: 29944857). These differences are all consistent with a model in which Zdhhc9 KO mice, despite their significantly reduced MBP palmitoylation (Fig 8) have grossly normal distribution and levels of MBP when all splice variants are assessed (Fig 3, Fig 8). It is not inconceivable that Zdhhc9 KO mice have a nanoscale change in the distribution of MBP, particularly of specific palmitoylated splice variants, within myelin that profoundly affects myelin ultrastructure, without grossly altering MBP distribution. However, an alternative and not mutually exclusive possibility is that aberrant palmitoylation of other

      Zdhhc9 substrates accounts for, or contributes to, the abnormalities in myelin at the ultrastructural level. Addressing this issue would require a multi-pronged approach, not just to assess palmitoylation and distribution of such proteins in Zdhhc9 KO, but also to test whether they are direct Zdhhc9 substrates, in order to rule out indirect effects. We hope reviewers would agree that this is best left to a separate study. However, in our revised Discussion we now summarize what can be inferred regarding Zdhhc9-dependent effects on total and splicevariant specific distribution and levels of MBP.  

      (4) Although the article mentions the association of Zdhhc9 with intellectual disabilities, it does not involve behavioral analysis of Zdhhc9 KO mice. It is recommended to supplement some behavioral experimental data to support the important role of Zdhhc9 in maintaining normal cognitive function, enhancing the clinical relevance of the article.

      We appreciate this point from the reviewer. The behavior of the same ZDHHC9 KO mouse line that we used was reported in PMID: 31747610 and in PMID: 29944857. In the former study, Zdhhc9 KO mice were reported to display seizures reminiscent of phenotypes in human patients with ZDHHC9 mutation. The latter study assessed performance of Zddhc9 KO mice in several tasks that test cognitive function. Specifically the KO mice were reported to display “altered behaviour in the open-field test, elevated plus maze and acoustic startle test that is consistent with a reduced anxiety level; a reduced hang time in the hanging wire test that suggests underlying hypotonia but which may also be linked to reduced anxiety [and] deficits in the Morris water maze test of hippocampal-dependent spatial learning and memory.”. We have incorporate these findings in our revised Discussion, where we summarize how these phenotypes are common, not just to human patients with ZDHHC9 mutation, but also to other human neurodevelopmental conditions and mouse models in which ID is a common feature.

      (5) For the abnormal myelination observed in Zdhhc9 KO mice, including unmyelinated large-diameter axons and excessively myelinated small-diameter axons, the article lacks indepth research and explanation on the exact mechanism and mode of action of ZDHHC9 in regulating myelination.

      We share the reviewer’s interest in this point but again note that gaining definitive insights into this issue is far from trivial. Convincing evidence of a causative mechanism would require an exhaustive identification of ZDHHC9 in vivo substrates, followed by point mutation of substrate palmitoylation site(s) to determine the extent to which palmitoylation of such protein(s) phenocopies ZDHHC9 loss. Nonetheless, it is possible to break this question down and to summarize what we do and do not know. For example, our experiments in cultured OLs show that ZDHHC9 loss causes call-autonomous deficits in morphological maturation of these cells. We also know that ZDHHC9 loss results in impaired palmitoylation of MBP, a direct substrate for ZDHHC9. Moreover, loss of ZDHHC9 at Golgi outposts in OLs (a phenotype observed with several XLID-associated mutant forms of ZDHHC9, even those with no significant loss of catalytic activity) correlates with intellectual disability. Together, these findings are consistent with a model in which ZDHHC9 action at OL Golgi outposts is critical for normal myelination. However, it is yet to be determined whether the key substrates of ZDHHC9 include MBP, other palmitoyl-proteins that are key constituents of CNS myelin, or proteins whose palmitoylation is important for myelin protein trafficking and targeting. Another non-mutually exclusive possibility is that ZDHHC9 acts at Golgi outposts but indirectly, for example to drive the expression of myelin protein genes. Future experiments, including but not limited to palmitoyl-proteomics in ZDHHC9 (OL-specific) KO mice, will be needed to provide more definitive insights into this issue. We have expanded our Discussion of links between ZDHHC9 mutation and impaired myelination to summarize the above points.

      (6) The function of ZDHHC9 in OL may be related to the Golgi apparatus, but its exact role in these structures is still unclear. It is suggested to discuss in more detail the role of ZDHHC9 in the Golgi apparatus in the discussion section.

      We appreciate this point, which we considered as related to point (5) above. In our revised Discussion we highlight how ZDHHC9 action at Golgi outposts may involve direct palmitoylation of myelin proteins, palmitoylation of proteins that direct myelin proteins to the myelin membrane and/or activation of gene expression programs that serve to drive myelination. We further note that these possibilities are not mutually exclusive.

      (7) More experimental support and in-depth research are needed on the detailed mechanism of how ZDHHC9 and Golga7 cooperatively regulate MBP palmitoylation, and how this decrease in palmitoylation level leads to myelination defects.

      This is another important point – our new experiments suggest that, although some XLID mutations markedly affect ZDHHC9’s ability to palmitoylate MBP, others do not, yet all of the mutant forms fail to localize to Golgi outposts. These findings are consistent with a model in which the subcellular location at which ZDHHC9 palmitoylates MBP, and potentially other substrates, is critical for normal myelination. Interestingly, despite their marked differences in basal catalytic activity (as assessed by autopalmitoylation), wt and all XLID forms of ZDHHC9 appear to show enhanced activity (measured by both auto- and MBP palmitoylation) in the presence of ZDHHC9, suggesting that the association with Golga7 (which also localizes to Golgi outposts) is central to ZDHHC9 activity. This model is also highly consistent with the biased expression of Golga7 in OLs, compared to other CNS cell types (Fig 1E, 1F). Moreover, XLID-associated mutant forms of ZDHHC9 also show reduced protein stability and are impaired in their ability to form complexes with Golga7 (also known as Golgi Complex Protein 16kDa; GCP16; PMID: 37035671). Failure of ZDHHC9 XLID mutants to localize to Golgi outposts may thus be due to aberrant trafficking of mutant ZDHHC9 per se, but may also involve impaired association/stabilization of ZDHHC9/Golga7 complexes at these locations. Again, it is possible that either or both of these mechanisms, which are not mutually exclusive, contribute to impaired MBP palmitoylation and/or myelination deficits. We summarize these points in our revised Discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      This manuscript determines how PA28g, a proteasome regulator that is overexpressed in tumors, and C1QBP, a mitochondrial protein for maintaining oxidative phosphorylation that plays a role in tumor progression, interact in tumor cells to promote their growth, migration and invasion. Evidence for the interaction and its impact on mitochondrial form and function was provided although it is not particularly strong.

      The revised manuscript corrected mislabeled data in figures and provides more details in figure legends. Misleading sentences and typos were corrected. However, key experiments that were suggested in previous reviews were not done, such as making point mutations to disrupt the protein interactions and assess the consequence on protein stability and function. Results from these experiments are critical to determine whether the major conclusions are fully supported by the data.

      The second revision of the manuscript included the proximity ligation data to support the PA28g-C1QBP interaction in cells. However, the method and data were not described in sufficient detail for readers to understand. The revision also includes the structural models of the PA28g-C1QBP complex predicted by AlphaFold. However, the method and data were not described with details for readers to understand how this structural modeling was done, what is the quality of the resulting models, and the physical nature of the protein-protein interaction such as what kind of the non-covalent interactions exist in the interface of the protein complexes. Furthermore, while the interactions mediated by the protein fragments were tested by pull-down experiments, the interactions mediated by the three residues were not tested by mutagenesis and pull-down experiments. In summary, the revision was improved, but further improvement is needed.

      Thank you very much for your comments.

      (1) Based on your suggestion, we predicted the possible interaction sites using AlphaFold 3 and found that mutations in amino acids 76 and 78 of C1QBP affect the interaction with PA28γ (Revised Appendix Figure 1J). Subsequently, pulldown experiment also found that after mutating the amino acids at the two aforementioned sites (T76A, G78N), C1QBP that could bind to PA28γ decreased (Revised Figure 1J). The above results confirm that PA28γ could interacts with C1QBP, in a manner dependent on the N-terminus of C1QBP. These findings are now included in the revised manuscript “In addition, we employed AlphaFold 3 to perform energy minimization and predict hydrogen bonds between the C1QBP N-terminus (amino acids 1-167) and the PA28γ protein interaction region. The results suggest that the T76 and G78 residues of C1QBP may be key contributors to the interaction. Consistently, coimmunoprecipitation analysis demonstrated that mutations at these sites (C1QBPT76A and C1QBPG78N) significantly reduced the binding ability to PA28γ (Fig. 1J and Appendix Fig. 1J)”, specifically in results section. We believe this additional validation strengthens the robustness of our findings.

      (2) According to your suggestion, we have added a description of the results of PLA in the figure legend (Revised Figure 1C) and the method of PLA in the appendix file (Revised Appendix file, Part “Proximity Ligation Assay”). The revised text reads as follows: (C) PLA image of UM1 cells shows the interaction between C1QBP and PA28γ in both cytoplasm and nucleus (red fluorescence).

      (3) In the light of your suggestion, we have enriched the description of AlphaFold 3 analysis in the appendix file (Revised Appendix file, Page 10-11). The revised text reads as follows:

      “Prediction and Analysis of Protein Interactions

      Protein Sequence Retrieval and Structure Prediction

      The protein sequences of C1QBP and PA28γ were obtained from the AlphaFold Protein Structure Database. Structural predictions of the protein-protein interaction between C1QBP and PA28γ were conducted using AlphaFold 3. The plDDT (predicted local distance difference test) values were utilized to assess the confidence of the predicted models. Models with a plDDT score above 70 were considered confident, while those with a score above 90 were categorized as very high confidence. These values were annotated in the figures to indicate the reliability of the structural predictions.”

      “Protein Preparation and Structure Optimization

      The best-scored model for the C1QBP-PA28γ interaction predicted by AlphaFold 3 was selected for further analysis. The model was imported into MOE 2022 (Molecular Operating Environment) software for protein preparation. This process included the removal of water molecules and other heteroatoms, followed by the addition of hydrogen atoms to the structure. This step was essential for optimizing the protein’s 3D conformation and ensuring the correctness of the protonation states at physiological pH.”

      “Energy Minimization and Hydrogen Bond Prediction

      The protein structure was subjected to energy minimization using the Amber10: EHT (Effective Hamiltonian Theory) force field, with R-field 1: 80 settings to refine the model’s geometry. The minimization process was performed to optimize the protein’s internal energy and ensure stable conformation, followed by calculation of hydrogen bond interactions. The interaction energies and hydrogen bonds were analyzed to identify potential binding sites and stabilize the predicted protein-protein complex.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      Comments on revisions:

      My previous comments have been addressed. I have no additional points to make and congratulate the authors.

      Thank you for your acceptance.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the present of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. While the co-culture experiments are somewhat more difficult to interpret due to lack of a control for the effect of wildtype mouse astrocytes on human neurons, they are also consistent with the notion that deletion of Nlgn1-4 from astrocytes has no consequences for the function of excitatory synapses. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have fully addressed my concerns, and have in particular conducted a very elegant and compelling analysis of the degree of deletion of astrocytic Nlgn1-3/4 in their models. This greatly strengthens the main claims of their study and the fundamental nature of their conclusions for the field of synapse biology.

      I am somewhat less convinced by the newly added experiment to investigate deletion of Nlgns1-4 from glia in glia-neuron co-cultures. The authors provide no evidence to show that either WT or cKO glia have any effect on synapse formation or function in human neurons, and therefore, the current lack of a difference could equally result from the fact that both WT and cKO glia were non-functional altogether. The authors cite two studies to state that human neurons do not form synapses in the absence of astrocytes, Zhang et al. 2013 and Huang et al. 2017, but neither seem to be listed in the references (unless Zhang et al. 2014 was meant), making it difficult to assess the relevance of these data. However, since the data on astrocytic Nlgn1-3 deletion in vivo are compelling on their own, I do not see the co-culture experiment as essential for the main conclusions of the study.

      Minor comment:

      Please add the information on the strain background of the mice to the methods section of the manuscript. Strain background can have a significant impact on many aspects of neuronal function, and this information is therefore essential for the interpretation of potential differences to other studies.

      We deeply apologize for forgetting to include the two important references mentioned by the reviewer in the reference list. We understand that the reviewer as a result could not assess the validity of our statement that co-culture of glia is required for efficient synapse formation by human neurons that are induced from ES or iPS cells. Note that this conclusion does not postulate that all synapse formation requires glia, since the cited papers demonstrate that human neurons induced by our protocol still form scarce synapses without glia. This observation has been confirmed in many different experiments that were performed after the data presented in the cited papers. As a result of this extensive prior documentation that human neurons produced by forced expression of Ngn2 require coculture of glia for efficient synapse formation, we do not feel that we need to repeat this basic characterization of our culture system again to validate multiple previous papers and hope the reviewer will concur. We have additionally added the relevant mouse strain information to the methods section.

    1. Author response:

      Reviewer #1:

      Point 1

      Not many weaknesses, but probably validation at more enhancers could have made the paper stronger.

      We experimentally validated two sets of enhancers from two distinct tissues and observed similar effects. While this supports the idea that the TEAD-tissue-specific TF interaction we observe is not restricted to a single tissue, we agree that testing additional enhancers from a third tissue would strengthen our conclusions. We will acknowledge in the discussion that including a third tissue could provide additional support for the generality of our findings.

      Reviewer #2:

      Point 1

      The authors propose a mechanism of a TF trio (TEAD - CHD4 - tissue-specific TFs). However, only one validation experiment checked CHD4. CHD4 binding was not mentioned at all in the other cases.

      Indeed, CHD4 binding was experimentally validated at only one enhancer. This was a deliberate decision based on two key considerations:

      (1) Consistent functional response across enhancers: We tested multiple enhancers (n =8) for functional response to the TEAD+YAP and GATA4/6 combination. All enhancers tested exhibited the same trend—attenuation of GATA-mediated activation upon co-expression of TEAD or TEAD/YAP. This consistent pattern supports a shared mechanism across these elements.

      (2) Substantial prior evidence supporting CHD4 recruitment by both GATA4 and YAP: Specifically, CHD4 recruitment by GATA4 has been described in the context of cardiovascular development[1], and CHD4 can also be recruited by TEAD coactivator YAP2. Furthermore, published genomic occupancy data from embryonic heart tissue show widespread co-binding of GATA4, TEAD, and CHD4[1,3], including at most of the cardiac enhancers we functionally tested (4 out of 5).

      Given the consistent enhancer responses and the supporting literature and genomic data indicating TEAD-CHD4 co-occupancy, we chose to validate CHD4 binding at a representative enhancer as a proof of concept.

      We will clarify this rationale in the revised manuscript to better address this concern.

      Reviewer #2:

      Point 2

      The authors integrated E12.5 TEAD binding with E11.5 acetylation data, and it would be important to show that this experimental approach is valid or otherwise qualify its limitations.

      We will provide additional evidence in support of this approach in the revised manuscript or alternatively acknowledge its limitations.

      Reviewer #2:

      Point 3

      Motif co-occurrence analysis was extended to claiming TF interactions without further validation.

      We thank the reviewer for pointing out this important distinction. We reviewed the manuscript and identified seven instances where TF interactions were mentioned. Four of these correctly refer to previously established protein-protein interactions. For the remaining instances, we will adjust the wording to reflect the level of evidence, e.g.  describe combinatorial binding based on motif co-occurrence, rather than implying direct interaction.

      Reviewer #3:

      Point 1

      Much of this manuscript focuses on confirming transcription factor relationships that have been reported previously. For example, it is well known that GATA4 interacts with MEF2 in the ventricle. There are limited new or unexpected associations discussed and tested.

      We thank the reviewer for this important observation and see the recurrence of known interactions, such as GATA4-MEF2, not as a drawback, but as an important validation of our methodology.

      The identification of novel TF-TF combinations was geared toward uncovering shared regulatory principles across diverse human developmental tissues. While analysing 13 heterogeneous embryonic tissues introduced limitations, such as cellular complexity that may obscure rare interactions, it also allowed the identification of robust, recurrent patterns across tissues.  Indeed, using this approach, we identified the widespread combinatorial effect of TEAD in partnership with lineage-specific TFs, which is explored more in depth in the manuscript.

      Another main goal of the study was to develop and demonstrate a generalizable strategy for identifying combinatorial TF binding patterns that underlie tissue-specific gene regulation. Given the inherent heterogeneity of the embryonic organs analysed, the approach is naturally biased toward recovering the most prevalent, and often well-characterized, TF combinations. While we fully acknowledge this limitation, we believe that the ability to robustly recover well-established TF partnerships across multiple organs provides a valuable proof of concept. The next step will be to apply this strategy to single-cell RNA datasets, in order to define TF relationships at higher resolution, for example, resolving associations down to specific family members that cooperate within distinct lineages or cell types, and identifying less frequent or underrepresented TF-TF relationships.

      In this context, we believe that our strategy has successfully highlighted shared enhancer logic and offers a framework for future high-resolution dissection of TF cooperativity at the single-cell level. The rationale for analysing heterogeneous tissues, along with its limitations, will be addressed in the revised version.

      Reviewer #3:

      Point 2

      Embryonic tissues are highly heterogeneous, limiting the utility of the bulk ChIP-seq employed in these analyses. Does the cellular heterogeneity explain the discrepancy between TEAD binding and histone acetylation? Similarly, how does conservation between species affect the TF predictions?

      We thank the reviewer for raising these important points. We acknowledge the limitations of using bulk ChIP-seq data in the context of complex embryonic tissues (see also previous point). We cannot exclude that the discrepancy between TEAD binding and histone acetylation is an effect of cellular heterogeneity. Indeed, we mention in the results “Our ventricle-specific enhancers were sampled at a single time point and likely represent enhancers that are selectively active in different cell types and developmental stages, given the heterogeneity of cell types in the ventricle”. The limitation of bulk ChIP-seq will be addressed in the discussion. In the specific case of the enhancers selected for validation, the binding site sequences are conserved between species, suggesting that the cis-regulatory activity is likely to be similar in both.

      Reviewer #3:

      Point 3

      Some of the interpretations should also be fleshed out a bit more to clarify the advantage of the analyses presented here. For example, if Gata4 and Foxa2 transcripts are expressed during different stages of development, then it's likely that (as stated by the authors) these motifs are not used during the same stage of development. But examining the flanking regions wasn't necessary to make that statement. This type of conclusion seems tangential to the benefit of this analysis, which is to understand which TFs work together in a single organ at a single time point.

      We appreciate the reviewer’s comment and the opportunity to clarify our interpretation. The reviewer refers to the finding that GATA4 and FOXA2 motifs are flanked by different sets of motifs in liver enhancers, suggesting that these TFs operate within distinct regulatory contexts.

      Our aim was not to state that GATA4 and FOXA2 do not function simultaneously—this can indeed be inferred from their non-overlapping expression patterns. Rather, we intended to highlight the potential of our approach, even when applied to bulk data, to resolve distinct regulatory modules that may act in different subpopulations of cells or developmental windows within the same tissue.

      We will revise the relevant section of the manuscript to make this interpretative point clearer.

      Reviewer #3:

      Point 4

      This manuscript hinges on luciferase assays whose results can be difficult to translate to complex gene regulation networks. Many motifs are often clustered together, which makes designing experiments at endogenous loci important in studies such as this one.

      We agree with the Reviewer that luciferase assays represent an oversimplified model of gene regulation and do not fully capture the complexity of endogenous regulatory networks. We will explicitly acknowledge this limitation in the discussion.

      Mutagenesis of TEAD and tissue-specific TF motifs at endogenous loci would provide more conclusive evidence. However, our goal was to test the generality of TEAD effect across multiple enhancers and tissues. Despite its limitations, a luciferase-based assay was the most feasible approach, as an endogenous strategy would not have allowed us to assess a broader set of enhancers efficiently. Additionally, the presence of recurrent motifs and the potential functional redundancy among enhancers targeting the same gene can complicate the interpretation of single-locus perturbations.

      References

      (1) Robbe ZL, Shi W, Wasson LK, Scialdone AP, Wilczewski CM, Sheng X, et al. CHD4 is recruited by GATA4 and NKX2-5 to repress noncardiac gene programs in the developing heart. Genes Dev. 2022 Apr 1;36(7–8):468–82.

      (2) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional Co-repressor Function of the Hippo Pathway Transducers YAP and TAZ. Cell Rep. 2015 Apr;11(2):270–82.

      (3) Akerberg BN, Gu F, VanDusen NJ, Zhang X, Dong R, Li K, et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat Commun. 2019 Oct 28;10(1):4907.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy. 

      Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses? 

      This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset. 

      Strengths: 

      Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset. 

      Weaknesses: 

      Introduction: The introduction lacks a paragraph providing an overview of ICI-induced arthritis pathogenesis and a comparison with other types of arthritis. Including this would help contextualize the study for a broader audience.

      Thank you for this suggestion, we have added a paragraph on ICI-arthritis to intro (pg. 4, middle paragraph).  

      Results Section: At the beginning of the results section, the experimental setup should be described in greater detail to make an easier transition into the results for the reader, rather than relying just on references to Figure 1 captions.

      We have clarified the experimental setup (pg. 5).  

      There is insufficient comparison between single-cell RNA-seq data from ICI-induced arthritis and previously published single-cell RA datasets. Such a comparison may include DEGs and GSEA, pathway analysis comparison for similar subsets of cells. Ideally, an integration with previous datasets with RA-tissue-derived primary monocytes would allow for a direct comparison of subsets and their transcriptomic features.

      We thank the Reviewer for this suggestion, which has increased the impact of our data and analysis. A computationally rigorous representation mapping approach showed that ICI-arthritis myeloid subsets predominantly mapped onto 4 previously defined RA subsets including IL-1β+ cells. This result was corroborated using a complementary data integration approach. Analysis of (TNF + PGE)-induced gene sets (TP signatures) in ICI-arthritis myeloid cells projected onto the RA subsets using the AUCell package showed elevated TP gene expression in similar ICI-arthritis and RA monocytic cell subsets. We also found mutually exclusive expression of TP and IFN signatures in distinct RA and ICI-arthritis myeloid cell subsets, which supports that the opposing cross-regulation between IFN-γ and PGE2 pathways that we identified in vitro also functions similarly in vivo. This analysis is shown in the new Fig. 3, described on pg. 7, and discussed on pp. 13-14.

      While it's understandable that arthritis samples are limited in numbers and myeloid cell numbers, it would still be interesting to see the results of PGE2+TNF in vitro stimulation on the primary RA or ICI-RA macrophages. It would be valuable to see RNA-Seq signatures of patient cell reactivation in comparison to primary stimulation of healthy donor-derived monocytes.

      We agree that this would be interesting but given limited samples and distribution of samples amongst many studies and investigators this is beyond the scope of the current study.  

      Discussion: Prior single-cell studies of RA and RA macrophage subpopulations from 2019, 2020, 2023 publications deserve more discussion. A thorough comparison with these datasets would place the study in a broader scientific context. 

      Creating an integrated RA myeloid cell atlas that combines ICI-RA data into the RA landscape would be ideal to add value to the field. 

      As one of the next research goals, TNF blockade data in RA and ICI-RA patients would be interesting to add to such an integrated atlas. Combining responders and non-responders to TNF blockade would help to understand patient stratification with the myeloid pathogenic phenotypes. It would be great to read the authors' opinion on this in the Discussion section. 

      Please see our response to point 3 above. This point is addressed in Fig. 3, pg. 7, and pp. 13-14, which includes a discussion of responders and nonresponders and patient stratification.  

      Conclusion: The authors demonstrated that while PGE2 maintains the inflammatory profile of macrophages, it also induces a distinct phenotype in simultaneous PGE2 and TNF treatment. The study of this specific subset in single-cell data from ICI-RA patients sheds light on the pathogenic mechanisms underlying this condition, however, how it compares with conventional RA is not clear from the manuscript. 

      Given the substantial incidence of ICI-induced autoimmune arthritis, understanding the unique macrophage subsets involved for future targeting them therapeutically is an important challenge. The findings are significant for immunologists, cancer researchers, and specialists in autoimmune diseases, making the study relevant to a broad scientific audience. 

      Reviewer #2 (Public review): 

      Summary/Significance of the findings: 

      The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis. 

      Strengths: 

      The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis. 

      Weaknesses: 

      (1) The authors carried out most of the assays in the monocytes/macrophages. How do APCcells like Dendritic cells behave with respect to this TP treatment similar dosing? 

      We agree that this is an interesting topic especially as TNF + PGE2 is one of the standard methods of maturing in vitro generated human DCs and promoting antigen-presenting function. As DC maturation is quite different from monocyte activation this would represent a new study and is beyond the scope of the current manuscript. We have instead added a paragraph to the discussion (pg. 12) and cited the literature on DC maturation by TNF + PGE2 including one of our older papers (PMID: 18678606; 2008)  

      (2) The authors studied 3h and 24h post-treatment transcriptomic and epigenomic. What happens to TP induce inflammatory genes post-treatment 12h, 36h, 48h, 72h. It is critical to see the upregulated/downregulated genes get normalised or stay the same throughout the innate immune response.

      We now clarify that subsets of inducible genes showed distinct kinetics of induction with transient expression at 3 hr versus sustained expression over the 24 hr stimulation period as shown in Supplementary Fig. 1 (pg. 5).

      (3) The authors showed IL1-axis in response to the TP-treatment. Do other cytokine axes get modulated? If yes, then how do they cooperate to reduce/induce inflammatory responses along this proposed axis?

      This is an interesting question, which we approached using a combination of pathway analysis and targeted inspection of pathways important pathogenesis of RA, which is the inflammatory condition most relevant for this study. In addition to genes in the IL-1-NF-κB core inflammatory pathway, pathway analysis of genes induced by TP co-stimulation showed enrichment of genes related to leukocyte chemotaxis, in particular neutrophil migration. Accordingly, TP costimulation increased expression of CSF3, which plays a key role in mobilizing neutrophils from the bone marrow, and major neutrophil chemokines CXCL1, CXCL2, CXCL3 and CXCL5 that recruit neutrophils to sites of inflammation including in inflammatory arthritis. Analysis of the late response to TNF similarly showed enrichment of genes important in chemotaxis, and suppression of genes in the cholesterol biosynthetic pathway, which we and others have previously linked to IFN responses. Targeted inspection of genes in additional pathways implicated in RA pathogenesis showed increased expression of genes in the Notch pathway. We believe that these pathways work together with the IL-1 pathway to increase immune cell recruitment and activation in inflammatory responses; these results are described on pp. 5-6 and are incorporated into Figures 1, 2 and Supplementary Fig. 2. 

      Overall, the data looks good and acceptable but I need to confirm the above-mentioned criticisms. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):   

      The discussion section of the manuscript claims: "In this study, we utilized transcriptomics to demonstrate a 'TNF + PGE2' (TP) signature in RA and ICI-arthritis IL-1β+ synovial macrophages." This statement is misleading, as no new transcriptomic data from RA synovial samples were generated in this study. To support such a claim, the authors would need to compare primary monocytes or macrophages from RA patients using bulk RNA-seq or singlecell RNA-seq. Based on the current data, the comparison is limited to bulk RNA-seq findings from the authors' in vitro model and prior monocyte-fibroblast coculture studies. 

      We have modified the abstract and discussion (pg. 10) to reflect that we have compared an in vitro generated TP signature with gene expression in previously identified RA macrophage subsets.

    1. Author response:

      [The following is the authors’ response to the original reviews.]

      We extend our sincere thanks to the editor, referees for eLife, and other commentators who have written evaluations of this manuscript, either in whole or in part. Sources of these comments were highly varied, including within the bioRxiv preprint server, social media (including many comments received on X/Twitter and some YouTube presentations and interviews), comments made by colleagues to journalists, and also some reviews of the work published in other academic journals. Some of these are formal and referenced with citations. Others were informal but nonetheless expressed perspectives that helped enable us to revise the manuscript with the inclusion of broader perspectives than the formal review process. It is beyond the scope of this summary to list every one of these, which have often been brought to the attention of different coauthors, but we begin by acknowledging the very wide array of peer and public commentary that have contributed to this work. The reaction speaks to a broad interest in open discussion and review of preprints. 

      As we compiled this summary of changes to the manuscript, we recognized that many colleagues made comments about the process of preprint dissemination and evaluation rather than the data or analyses in the manuscript. Addressing such comments is outside the scope of this revised manuscript, but we do feel that a broader discussion of these comments would be valuable in another venue. Many commentators have expressed confusion about the eLife system of evaluation of preprints, which differs from the editorial acceptance or rejection practiced in most academic journals. As authors in many different nations, in varied fields, and in varied career stages, we ourselves are still working to understand how the academic publication landscape is changing, and how best to prepare work for new models of evaluation and dissemination. 

      The manuscript and coauthor list reflect an interdisciplinary collaboration. Analyses presented in the manuscript come from a wide range of scientific disciplines. These range from skeletal inventory, morphology, and description, spatial taphonomy, analysis of bone fracture patterns and bone surface modifications, sedimentology, geochemistry, and traditional survey and mapping. The manuscript additionally draws upon a large number of previous studies of the Rising Star cave system and the Dinaledi Subsystem, which have shaped our current work. No analysis within any one area of research stands alone within this body of work: all are interpreted in conjunction with the outcomes of other analyses and data from other areas of research. Any single analysis in isolation might be consistent with many different hypotheses for the formation of sediments and disposition of the skeletal remains. But testing a hypothesis requires considering all data in combination and not leaving out data that do not fit the hypothesis. We highlight this general principle at the outset because a number of the comments from referees and outside specialists have presented alternative hypotheses that may arguably be consistent with one kind of analysis that we have presented, while seeming to overlook other analyses, data, or previous work that exclude these alternatives. In our revision, we have expanded all sections describing results to consider not only the results of each analysis, but how the combination of data from different kinds of analysis relate to hypotheses for the deposition and subsequent history of the Homo naledi remains. We address some specific examples and how we have responded to these in our summary of changes below. 

      General organization:

      The referee and editor comments are mostly general and not line-by-line questions, and we have compiled them and treated them as a group in this summary of changes, except where specifically noted. 

      The editorial comments on the previous version included the suggestion that the manuscript should be reorganized to test “natural” (i.e. noncultural) hypotheses for the situations that we examine. The editorial comment suggested this as a “null hypothesis” testing approach. Some outside comments also viewed noncultural deposition as a null hypothesis to be rejected. We do not concur that noncultural processes should be construed as a null hypothesis, as we discuss further below. However, because of the clear editorial opinion we elected to revise the manuscript to make more explicit how the data and analyses test noncultural depositional hypotheses first, followed by testing of cultural hypotheses. This reorganization means that the revised manuscript now examines each hypothesis separately in turn. 

      Taking this approach resulted in a substantial reorganization of the “Results” section of the manuscript. The “Results” section now begins with summaries of analyses and data conducted on material from each excavation area. After the presentation of data and analyses from each area, we then present a separate section for each of several hypotheses for the disposition and sedimentary context of the remains. These hypotheses include deposition of bodies upon a talus (as hypothesized in some previous work), slow sedimentary burial on a cave floor or within a natural depression, rapid burial by gravity-driven slumping, and burial of naturally mummified remains. We then include sections to test the hypothesis of primary cultural burial and secondary cultural burial. This approach adds substantial length to the Results. While some elements may be repeated across sections, we do consider the new version to be easier to take piece by piece for a reader trying to understand how each hypothesis relates to the evidence. 

      The Results section includes analyses on several different excavation areas within the Dinaledi Subsystem. Each of these presents somewhat different patterns of data. We conceived of this manuscript combining these distinct areas because each of them provides information about the formation history of the Homo naledi-associated sediments and the deposition of the Homo naledi remains. Together they speak more strongly than separately. In the previous version of the manuscript, two areas of excavation were considered in detail (Dinaledi Feature 1 and the Hill Antechamber Feature), with a third area (the Puzzle Box area) included only in the Discussion and with reference to prior work. We now describe the new work undertaken after the 2013-2014 excavations in more detail. This includes an overview of areas in the Hill Antechamber and Dinaledi Chamber that have not yielded substantial H. naledi remains and that thereby help contextualize the spatial concentration of H. naledi skeletal material. The most substantial change in the data presented is a much expanded reanalysis of the Puzzle Box area. This reanalysis provides greater clarity on how previously published descriptions relate to the new evidence. The reanalysis also provides the data to integrate the detailed information on bone identification fragmentation, and spatial taphonomy from this area with the new excavation results from the other areas. 

      In addition to Results, the reorganization also affected the manuscript’s Introduction section. Where the previous version led directly from a brief review of Pleistocene burial into the description of the results, this revised manuscript now includes a review of previous studies of the Rising Star cave system. This review directly addresses referee comments that express some hesitation to accept previous results concerning the structure and formation of sediments, the accessibility of the Dinaledi Subsystem, the geochronological setting of the H. naledi remains, and the relation of the Dinaledi Subsystem to nearby cave areas. Some parts of this overview are further expanded in the Supplementary Information to enable readers to dive more deeply into the previous literature on the site formation and geological configuration of the Rising Star cave system without needing to digest the entirety of the cited sources. 

      The Discussion section of the revised manuscript is differentiated from Results and focuses on several areas where the evidence presented in this study may benefit from greater context. One new section addresses hypothesis testing and parsimony for Pleistocene burial evidence, which we address at greater length in this summary below. The majority of the Discussion concerns the criteria for recognizing evidence for burial as applied in other studies. In this research we employ a minimal definition but other researchers have applied varied criteria. We consider whether these other criteria have relevance in light of our observations and whether they are essential to the recognition of burial evidence more broadly. 

      Vocabulary:

      We introduce the term “cultural burial” in this revised manuscript to refer to the burial of dead bodies as a mortuary practice. “Burial” as an unmodified term may refer to the passive covering of remains by sedimentary processes. Use of the term “intentional burial” would raise the question of interpreting intent, which we do not presume based on the evidence presented in this research. The relevant question in this case is whether the process of burial reflects repeated behavior by a group. As we received input from various colleagues it became clear that burial itself is a highly loaded term. In particular there is a common assumption within the literature and among professionals that burial must by definition be symbolic. We do not take any position on that question in this manuscript, and it is our hope that the term “cultural burial” may focus the conversation around the extent that the behavioral evidence is repeated and patterned. 

      Sedimentology and geochemistry of Dinaledi Feature 1:

      Reviewer 4 provided detailed comments on the sedimentological and geochemical context that we report in the manuscript. One outside review (Foecke et al. 2024) included some of the points raised by reviewer 4, and additionally addressed the reporting of geochemical and sedimentological data in previous work that we cite. 

      To address these comments we have revised the sedimentary context and micromorphology of sediments associated with Dinaledi Feature 1. In the new text we demonstrate the lack of microstratigraphy (supported by grain size analysis) in the unlithified mud clast breccia (UMCB), while such a microstratigraphy is observed in the laminated orange-red mudstones (LORM) that contribute clasts to the UMCB. Thus, we emphasize the presence and importance of a laterally continuous layer of LORM nature occurring at a level that appears to be the maximum depth of fossil occurrence. This layer is severely broken under extensive accumulation of fossils such as Feature 1 and only evidenced by abundant LORM clasts within and around the fossils. 

      We have completely reworked the geochemical context associated with Feature 1 following the comments of reviewer 4. We described the variations and trends observed in the major oxides separate from trace and rare-earth elements. We used Harker variations plots to assess relationships between these element groups with CaO and Zn, followed by principal component analysis of all elements analyzed. The new geochemical analysis clearly shows that Feature 1 is associated with localized trace element signatures that exist in the sediments only in association with the fossil bones, which suggests lack of postdepositional mobilization of the fossils and sediments. We additionally have included a fuller description of XRF methods. 

      To clarify the relation of all results to the features described in this study, we removed the geochemical and sedimentological samples from other sites within the Dinaledi Subsystem. These localities within the fissure network represent only surface collection of sediment, as no excavation results are available from those sites to allow for comparison in the context of assessing evidence of burial. These were initially included for comparison, but have now been removed to avoid confusion.  

      Micromorphology of sediments:

      Some referees (1, 3, and 4) and other commentators (including Martinón-Torres et al. 2024) have suggested that the previous manuscript was deficient due to an insufficient inclusion of micromorphological analysis of sediments. Because these commentators have emphasized this kind of evidence as particularly important, we review here what we have included and how our revision has addressed this comment. Previous work in the Dinaledi Chamber (Dirks et al., 2015; 2017) included thin section illustrations and analysis of sediment facies, including sediments in direct association with H. naledi remains within the Puzzle Box area. The previous work by Wiersma and coworkers (2020) used micromorphological analysis as one of several approaches to test the formation history of Unit 3 sediments in the Dinaledi Subsystem, leading to the interpretation of autobrecciation of earlier Unit 1 sediment. In the previous version of this manuscript we provided citations to this earlier work. The previous manuscript also provided new thin section illustrations of Unit 3 sediment near Dinaledi Feature 1 to place the disrupted layer of orange sediment (now designated the laminated orange silty mudstone unit) into context. 

      In the new revised manuscript we have added to this information in three ways. First, as noted above in response to reviewer 4, we have revised and added to our discussion of micromorphology within and adjacent to the Dinaledi Feature 1. Second, we have included more discussion in the Supplementary Information of previous descriptions of sediment facies and associated thin section analysis, with illustrations from prior work (CC-BY licensed) brought into this paper as supplementary figures, so that readers can examine these without following the citations. Third, we have included Figure 10 in the manuscript which includes six panels with microtomographic sections from the Hill Antechamber Feature. This figure illustrates the consistency of sub-unit 3b sediment in direct contact with H. naledi skeletal material, including anatomically associated skeletal elements, with previous analyses that demonstrate the angular outlines and chaotic orientations of LORM clasts. It also shows density contrasts of sediment in immediate contact with some skeletal elements, the loose texture of this sediment with air-filled voids, and apparent invertebrate burrowing activity. To our knowledge this is the first application of microtomography to sediment structure in association with a Pleistocene burial feature. 

      To forestall possible comments that the revised manuscript does not sufficiently employ micromorphological observations, or that any one particular approach to micromorphology is the standard, we present here some context from related studies of evidence from other research groups working at varied sites in Africa, Europe, and Asia. Hodgkins et al. (2021) noted: “Only a handful of micromorphological studies have been conducted on human burials and even fewer have been conducted on suspected burials from Paleolithic or hunter-gatherer contexts.” In that study, one supplementary figure with four photomicrographs of thin sections of sediments was presented. Interpretation of the evidence for a burial pit by Hodgkins et al. (2021) noted the more open microstructure of sediment but otherwise did not rely upon the thin section data in characterizing the sediments associated with grave fill. Martinón-Torres et al. (2021) included one Extended Data figure illustrating thin sections of sediments and bone, with two panels showing sediments (the remainder showing bone histology). The micromorphological analysis presented in the supplementary information of that paper was restricted to description of two microfacies associated with the proposed “pit” in that study. That study did carry out microCT scanning of the partially-prepared skeletal remains but did not report any sediment analysis from the microtomographic results. Maloney et al. (2022) reported no micromorphological or thin section analysis. Pomeroy et al. (2020a) included one illustration of a thin section; this study may be regarded as a preliminary account rather than a full description of the work undertaken. Goldberg et al. (2017) analyzed the geoarchaeology of the Roc de Marsal deposits in which possible burial-associated sediments had been fully excavated in the 1960s, providing new morphological assessments of sediment facies; the supplementary information to this work included five scans (not microscans) of sediment thin sections and no microphotographs. Fewlass et al. (2023) presented no thin section or micromorphological illustrations or methods. In summary of this research, we note that in one case micromorphological study provided observations that contributed to the evidence for a pit, in other cases micromorphological data did not test this hypothesis, and many researchers do not apply micromorphological techniques in their particular contexts. 

      Sediment micromorphology is a growing area of research and may have much to provide to the understanding of ancient burial evidence as its standards continue to develop (Pomeroy et al. 2020b). In particular microtomographic analysis of sediments, as we have initiated in this study, may open new horizons that are not possible with more destructive thin-section preparation. In this manuscript, the thin section data reveals valuable evidence about the disruption of sediment structure by features within the Dinaledi Chamber, and microtomographic analysis further documents that the Hill Antechamber Feature reflects similar processes, in addition to possible post-burial diagenesis and invertebrate activity. Following up in detail on these processes will require further analysis outside the scope of this manuscript. 

      Access into the Dinaledi Subsystem:

      Reviewer 1 emphasizes the difficulty of access into the Dinaledi Subsystem as a reason why the burial hypothesis is not parsimonious. Similar comments have been made by several outside commentators who question whether past accessibility into the Dinaledi Subsystem may at one time have been substantially different from the situation documented in previous work. Several pieces of evidence are relevant to these questions and we have included some discussion of them in the Introduction, and additionally include a section in the Supplementary Information (“Entrances to the cave system”) to provide additional context for these questions. Homo naledi remains are found not only within the Dinaledi Subsystem but also in other parts of the cave system including the Lesedi Chamber, which is similarly difficult for non-expert cavers to access. The body plan, mass, and specific morphology of H. naledi suggest that this species would be vastly more suited to moving and climbing within narrow underground passages than living people. On this basis it is not unparsimonious to suggest that the evidence resulted from H. naledi activity within these spaces. We note that the accessibility of the subsystem is not strictly relevant to the hypothesis of cultural burial, although the location of the remains does inform the overall context which may reflect a selection of a location perceived as special in some way. 

      Stuffing bodies down the entry to the subsystem:

      Reviewer 3 suggests that one explanation for the emplacement of articulated remains at the top of the sloping floor of the Hill Antechamber is that bodies were “stuffed” into the chute that comprises the entry point of the subsystem and passively buried by additional accumulation of remains. This was one hypothesis presented in earlier work (Dirks et al. 2015) and considered there as a minimal explanation because it did not entail the entry of H. naledi individuals into the subsystem. The further exploration (Elliott et al. 2021) and ongoing survey work, as well as this manuscript, all have resulted in data that rejects this hypothesis. The revised manuscript includes a section in the results “Deposition upon a talus with passive burial” that examines this hypothesis in light of the data. 

      Recognition of pits:

      Referee 3 and 4 and several additional commentators have emphasized that the recognition of pit features is necessary to the hypothesis of burial, and questioned whether the data presented in the manuscript were sufficient to demonstrate that pits were present. We have revised the manuscript in several ways to clarify how all the different kinds of evidence from the subsystem test the hypothesis that pits were present. This includes the presentation of a minimal definition of burial to include a pit dug by hominins, criteria for recognizing that a pit was present, and an evaluation of the evidence in each case to make clear how the evidence relates to the presence of a pit and subsequent infill. As referee 3 notes, it can be challenging to recognize a pit when sediment is relatively homogeneous. This point was emphasized in the review by Pomeroy and coworkers (2020b), who reflected on the difficulty seeing evidence for shallow pits constructed by hominins, and we have cited this in the main text. As a result, the evidence for pits has been a recurrent topic of debate for most Pleistocene burial sites. However in addition to the sedimentological and contextual evidence in the cases we describe, the current version also reflects upon other possible mechanisms for the accumulation of bones or bodies. The data show that the sedimentary fill associated with the H. naledi remains in the cases we examine could not have passively accumulated slowly and is not indicative of mass movement by slumping or other high-energy flow. To further put these results into context, we added a section to the Discussion that briefly reviews prior work on distinguishing pits in Pleistocene burial contexts, including the substantial number of sites with accepted burial evidence for which no evidence of a pit is present. 

      Extent of articulation and anatomical association:

      We have added significantly greater detail to the descriptions of articulated remains and orientation of remains in order to describe more specifically the configuration of the skeletal material. We also provide 14 figures in main text (13 of them new) to illustrate the configuration of skeletal remains in our data. For the Puzzle Box area, this now includes substantial evidence on the individuation of skeletal fragments, which enables us to illustrate the spatial configuration of remains associated with the DH7 partial skeleton, as well as the spatial position of fragments refitted as part of the DH1, DH2, DH3, and DH4 crania. For Dinaledi Feature 1 and the Hill Antechamber Feature we now provide figures that key skeletal parts as identified, including material that is unexcavated where possible, and a skeletal part representation figure for elements excavated from Dinaledi Feature 1. 

      Archaeothanatology:

      Reviewer 2 suggests that a greater focus on the archaeothanatology literature would be helpful to the analysis, with specific reference to the sequence of joint disarticulation, the collapse of sediment and remains into voids created by decomposition, and associated fragmentation of the remains. In the revised manuscript we have provided additional analysis of the Hill Antechamber Feature with this approach in mind. This includes greater detail and illustration of our current hypothesis for individuation of elements. We now discuss a hypothesis of body disposition, describe the persistent joints and articulation of elements, and examine likely decomposition scenarios associated with these remains. Additionally, we expand our description and illustration of the orientation of remains and degree of anatomical association and articulation within Dinaledi Feature 1. For this feature and for the Hill Antechamber Feature we have revised the text to describe how fracturing and crushing patterns are consistent with downward pressure from overlying sediment and material. In these features, postdepositional fracturing occurred subsequent to the decomposition of soft tissue and partial loss of organic integrity of the bone. We also indicate that the loss by postdepositional processes of most long bone epiphyses, vertebral bodies, and other portions of the skeleton less rich in cortical bone, poses a challenge for testing the anatomical associations of the remaining elements. This is a primary reason why we have taken a conservative approach to identification of elements and possible associations. 

      A further aspect of the site revealed by our analysis is the selective reworking of sediments within the Puzzle Box area subsequent to the primary deposition of some bodies. The skeletal evidence from this area includes body parts with elements in anatomical association or articulation, juxtaposed closely with bone fragments at varied pitch and orientation. This complexity of events evidenced within this area is a challenge for approaches that have been developed primarily based on comparative data from single-burial situations. In these discussions we deepen our use of references as suggested by the referee.   

      Burial positions:

      Reviewer 2 further suggests that illustrations of hypothesized burial positions would be valuable. We recognize that a hypothesized burial position may be an appealing illustration, and that some recent studies have created such illustrations in the context of their scientific articles. However such illustrations generally include a great deal of speculation and artist imagination, and tend to have an emotive character. We have added more discussion to the manuscript of possible primary disposition in the case of the Hill Antechamber Feature as discussed above. We have not created new illustrations of hypothesized burial positions for this revision. 

      Carnivore involvement:

      Referee 1 suggests that the manuscript should provide further consideration of whether carnivore activity may have introduced bones or bodies into the cave system. The reorganized Introduction now includes a review of previous work, and an expanded discussion within the Supplementary Information (“Hypotheses tested in previous work”). This includes a review of literature on the topic of carnivore accumulation and the evidence from the Dinaledi and Lesedi Chamber that rejects this hypothesis. 

      Water transport and mud:

      The eLife referees broadly accepted previous work showing that water inundation or mass flow of water-saturated sediment did not occur within the history of Unit 2 and 3 sediments, including those associated with H. naledi remains. However several outside commentators did refer specifically to water flow or mud flow as a mechanism for slumping of deposits and possible sedimentary covering of the remains. To address these comments we have added a section to the

      Supplementary Information (“Description of the sedimentary deposits of the Dinaledi Subsystem”) that reviews previous work on the sedimentary units and formation processes documented in this area. We also include a subsection specifically discussing the term “mud” as used in the description of the sedimentology within the system, as this term has clearly been confusing for nonspecialists who have read and commented on the work. We appreciate the referees’ attention to the previous work and its terminology.  

      Redescription of areas of the cave system:

      Reviewer 1 suggests that a detailed reanalysis of all portions of the cave system in and around the Dinaledi Subsystem is warranted to reject the hypothesis that bodies entered the space passively and were scattered from the floor by natural (i.e. noncultural) processes. The referee suggests that National Geographic could help us with these efforts. To address this comment we have made several changes to the manuscript. As noted above, we have added material in Supplementary Information to review the geochronology of the Dinaledi Subsystem and nearby Dragon’s Back Chamber, together with a discussion of the connections between these spaces. 

      Most directly in response to this comment we provide additional documentation of the possibility of movement of bodies or body parts by gravity within the subsystem itself. This includes detailed floor maps based on photogrammetry and LIDAR measurement, where these are physically possible, presented in Figures 2 and 3. In some parts of the subsystem the necessary equipment cannot be used due to the extremely confined spaces, and for these areas our maps are based on traditional survey methods. In addition to plan maps we have included a figure showing the elevation of the subsystem floor in a cross-section that includes key excavation areas, showing their relative elevation. All figures that illustrate excavation areas are now keyed to their location with reference to a subsystem plan. These data have been provided in previous publications but the visualization in the revised manuscript should make the relationship of areas clear for readers. The Introduction now includes text that discusses the configuration of the Hill Antechamber, Dinaledi Chamber, and nearby areas, and also discusses the instances in which gravity-driven movement may be possible, at the same time reviewing that gravity-driven movement from the entry point of the subsystem to most of the localities with hominin skeletal remains is not possible. 

      Within the Results, we have added a section on the relationship of features to their surroundings in order to assist readers in understanding the context of these bone-bearing areas and the evidence this context brings to the hypothesis in question. We have also included within this new section a discussion of the discrete nature of these features, a question that has been raised by outside commentators. 

      Passive sedimentation upon a cave floor or within a natural depression:

      Reviewer 3 suggests that the situation in the Dinaledi Subsystem may be similar to a European cave where a cave bear skeleton might remain articulated on a cave floor (or we can add, within a hollow for hibernation), later to be covered in sediment. The reviewer suggests that articulation is therefore no evidence of burial, and suggests that further documentation of disarticulation processes is essential to demonstrating the processes that buried the remains. We concur that articulation by itself is not sufficient evidence of cultural burial. To address this comment we have included a section in the Results that tests the hypothesis that bodies were exposed upon the cave floor or within a natural depression. To a considerable degree, additional data about disarticulation processes subsequent to deposition are provided in our reanalysis of the Puzzle Box area, including evidence for selective reworking of material after burial. 

      Postdepositional movement and floor drains:

      Reviewer 3 notes that previous work has suggested that subsurface floor drains may have caused some postdepositional movement of skeletal remains. The hypothesis of postdepositional slumping or downslope movement has also been discussed by some external commentators (including Martinón-Torres et al. 2024). We have addressed this question in several places within the revised manuscript. As we now review, previous discussion of floor drains attempted to explain the subvertical orientation of many skeletal elements excavated from the Puzzle Box area. The arrangement of these bones reflects reworking as described in our previous work, and without considering the possibility of reworking by hominins, one mechanism that conceivably might cause reworking was downward movement of sediments into subsurface drains. Further exploration and mapping, combined with additional excavation into the sediments beneath the Puzzle Box area provided more information relevant to this hypothesis. In particular this evidence shows that subsurface drains cannot explain the arrangement of skeletal material observed within the Puzzle Box area. As now discussed in the text, the reworking is selective and initiated from above rather than below. This is best explained by hominin activity subsequent to burial. 

      In a new section of the Results we discuss slumping as a hypothesis for the deposition of the remains. This includes discussion of downslope movement within the Hill Antechamber and the idea that floor drains may have been a mechanism for sediment reworking in and around the Puzzle Box area and Dinaledi Feature 1. As described in this section the evidence does not support these hypotheses. 

      Hypothesis testing and parsimony:

      Referees 1 and 3 and the editorial guidance all suggested that a more appropriate presentation would adopt a null hypothesis and test it. The specific suggestion that the null hypothesis should be a natural sedimentary process of deposition was provided not only by these reviewers but also by some outside commentators. To address this comment, we have edited the manuscript in two ways. The first is the addition of a section to the Discussion that specifically discusses hypothesis testing and parsimony as related to Pleistocene evidence of cultural burial. This includes a brief synopsis of recent disciplinary conversations and citation of work by other groups of authors, none of whom adopted this “null hypothesis” approach in their published work. 

      As we now describe in the manuscript, previous work on the Dinaledi evidence never assumed any role for H. naledi in the burial of remains. Reading the reviewer reports caused us to realize that this previous work had followed exactly the “null hypothesis” approach that some suggested we follow. By following this null hypothesis approach, we neglected a valuable avenue of investigation. In retrospect, we see how this approach impeded us from understanding the pattern of evidence within the Puzzle Box area. Thus in the revised manuscript we have mentioned this history within the Discussion and also presented more of the background to our previous work in the Introduction. Hopefully by including this discussion of these issues, the manuscript will broaden conversation about the relation of parsimony to these issues. 

      Language and presentation style:

      Reviewer 4 criticizes our presentation, suggesting that the text “gives the impression that a hypothesis was formulated before data were collected.” Other outside commentators have mentioned this notion also, including Martinón-Torres et al. (2024) who suggest that the study began from a preferred hypothesis and gathered data to support it. The accurate communication of results and hypotheses in a scientific article is a broader issue than this one study. Preferences about presentation style vary across fields of study as well as across languages. We do not regret using plain language where possible. In any study that combines data and methods from different scientific disciplines, the use of plain language is particularly important to avoid misunderstandings where terms may mean different things in different fields. 

      The essential question raised by these comments is whether it is appropriate to present the results of a study in terms of the hypothesis that is best supported. As noted above, we read carefully many recent studies of Pleistocene burial evidence. We note that in each of these studies that concluded that burial is the best hypothesis, the authors framed their results in the same way as our previous manuscript: an introduction that briefly reviews background evidence for treatment of the dead, a presentation of results focused on how each analysis supports the hypothesis of burial for the case, and then in some (but not all) cases discussion of why some alternative hypotheses could be rejected. We do not infer from this that these other studies started from a presupposition and collected data only to confirm it. Rather, this is a simple matter of presentation style. 

      The alternative to this approach is to present an exhaustive list of possible hypotheses and to describe how the data relate to each of them, at the end selecting the best. This is the approach that we have followed in the revised manuscript, as described above under the direction of the reviewer and editorial guidance. This approach has the advantage of bringing together evidence in different combinations to show how each data point rejects some hypotheses while supporting others. It has the disadvantage of length and repetition. 

      Possible artifact:

      We have chosen to keep the description of the possible artifact associated with the Hill Antechamber Feature in the Supplementary Information. We do this while acknowledging that this is against the opinion of reviewer 4, who felt the description should be removed unless the object in question is fully excavated and physically analyzed. The previous version of the manuscript did not rely upon the stone as positive evidence of grave goods or symbolic content, and it noted that the data do not test whether the possible artifact was placed or was intentionally modified. However this did not satisfy reviewer 4, and some outside commentators likewise asserted that the object must be a “geofact” and that it should be removed. 

      We have three arguments against this line of thinking. First, we do not omit data from our reporting. Whether Homo naledi shaped the rock or not, used it as a tool or not, whether the rock was placed with the body or not, it is unquestionably there. Omitting this one object from the report would be simply dishonest. Second, the data on this rock are at 16 micron resolution. While physical inspection of its surface may eventually reveal trace evidence and will enable better characterization of the raw material, no mode of surface scanning will produce better evidence about the object’s shape. Third, the position of this possible artifact within the feature provides significant information about the deposition of the skeletal material and associated sediments. The pitch, orientation, and position of the stone is not consistent with slow deposition but are consistent with the hypothesis that the surrounding sediment was rapidly emplaced at the same time as the articulated elements less than 2 cm away. 

      In the current version, we have redoubled our efforts to provide information about the position and shape of this stone while not presupposing the intentionality of its shape or placement. We add here that the attitude expressed by referee 4 and other commentators, if followed at other sites, would certainly lead to the loss or underreporting of evidence, which we are trying to avoid.  

      Consistency versus variability of behavior:

      As described in the revised manuscript, different features within the Dinaledi Subsystem exhibit some shared characteristics. At the same time, they vary in positioning, representation of individuals and extent of commingling. Other localities within the subsystem and broader cave system present different evidence. Some commentators have questioned whether the patterning is consistent with a single common explanation, or whether multiple explanations are necessary. To address this line of questioning, we have added several elements to the manuscript. We created a new section on secondary cultural burial, discussing whether any of the situations may reflect this practice. In the Discussion, we briefly review the ways in which the different features support the involvement of H. naledi without interpreting anything about the intentionality or meaning of the behavior. We further added a section to the Discussion to consider whether variation among the features reflects variation in mortuary practices by H. naledi. One aspect of this section briefly cites variation in the location and treatment of skeletal remains at other sites with evidence of burial. 

      Grave goods:

      Some commentators have argued that grave goods are a necessary criterion for recognizing evidence of ancient burial. We added a section to the Discussion to review evidence of grave goods at other Pleistocene sites where burial is accepted. 

      References:

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. eLife, 4, e09561. https://doi.org/10.7554/eLife.09561

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. eLife, 6, e24231. https://doi.org/10.7554/eLife.24231

      • Elliott, M., Makhubela, T., Brophy, J., Churchill, S., Peixotto, B., FEUERRIEGEL, E., Morris, H., Van Rooyen, D., Ramalepa, M., Tsikoane, M., Kruger, A., Spandler, C., Kramers, J., Roberts, E., Dirks, P., Hawks, J., & Berger, L. R. (2021). Expanded Explorations of the Dinaledi Subsystem,Rising Star Cave System, South Africa. PaleoAnthropology, 2021(1), 15–22. https://doi.org/10.48738/2021.iss1.68

      • Fewlass, H., Zavala, E. I., Fagault, Y., Tuna, T., Bard, E., Hublin, J.-J., Hajdinjak, M., & Wilczyński, J. (2023). Chronological and genetic analysis of an Upper Palaeolithic female infant burial from Borsuka Cave, Poland. iScience, 26(12). https://doi.org/10.1016/j.isci.2023.108283

      • Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (n.d.). No Sedimentological Evidence for Deliberate Burial by Homo naledi – A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024. https://doi.org/10.48738/202x.issx.xxx

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015. https://doi.org/10.1007/s12520-013-0163-2

      • Maloney, T. R., Dilkes-Hall, I. E., Vlok, M., Oktaviana, A. A., Setiawan, P., Priyatno, A. A. D., Ririmasse, M., Geria, I. M., Effendy, M. A. R., Istiawan, B., Atmoko, F. T., Adhityatama, S., Moffat, I., Joannes-Boyau, R., Brumm, A., & Aubert, M. (2022). Surgical amputation of a limb 31,000 years ago in Borneo. Nature, 609(7927), 547–551. https://doi.org/10.1038/s41586-022-05160-8

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), Article 7857. https://doi.org/10.1038/s41586021-03457-8

      • Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020a). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26. https://doi.org/10.15184/aqy.2019.207

      • Pomeroy, E., Hunt, C. O., Reynolds, T., Abdulmutalb, D., Asouti, E., Bennett, P., Bosch, M., Burke, A., Farr, L., Foley, R., French, C., Frumkin, A., Goldberg, P., Hill, E., Kabukcu, C., Lahr, M. M., Lane, R., Marean, C., Maureille, B., … Barker, G. (2020b). Issues of theory and method in the analysis of Paleolithic mortuary behavior: A view from Shanidar Cave. Evolutionary Anthropology: Issues, News, and Reviews, 29(5), 263–279. https://doi.org/10.1002/evan.21854

      • Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108

      • Wiersma, J. P., Roberts, E. M., & Dirks, P. H. G. M. (2020). Formation of mud clast breccias and the process of sedimentary autobrecciation in the hominin-bearing (Homo naledi) Rising Star Cave system, South Africa. Sedimentology, 67(2), 897–919. https://doi.org/10.1111/sed.12666

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Author response:

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      We thank the reviewer for describing the study’s strengths, reflecting the major conclusions of the initially submitted manuscript.  However, based on new analyses – including the requested analyses of other scRNA-seq datasets, our revision clarifies that:

      -  related to point (1), cone and rod transcripts do not appear to be mixed together at first (i.e., in immediately post-mitotic immature cone and rod precursors) but appear to be coexpressed in subsequent cone and rod precursor stages; and 

      - related to point (3), CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that co-express cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset). 

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for noting these important issues. Accordingly, in the revised manuscript:

      (1) We improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors. 

      (2) We scale back claims related to the role of SYK in the cone precursor response to RB1 loss, with wording changes in the Abstract, Results, and Discussion, which now recognize that the inhibitor studies only support the possibility that cone-intrinsic SYK expression contributes to retinoblastoma initiation, as detailed in our responses to Reviewer’s Recommendations for Authors. We agree and now mention that genetic perturbation of SYK is required to prove its role.  

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging fulllength sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, longread RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      We thank the reviewer for summarizing the main findings and noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development.  

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We agree that the manuscript covers a range of topics resulting from the full-length scRNAseq analyses and concur that some studies of developing photoreceptors were not well connected to retinoblastoma. However, we also note that the connection to retinoblastoma is emphasized in several places in the Introduction and throughout the manuscript and was a significant motivation for pursuing the analyses. We suggest that it was valuable to highlight how deep, fulllength scRNA-seq of developing retina provides insights into retinoblastoma, including i) the similar biased expression of NRL transcript isoforms in cone precursors and RB tumors, ii) the cone precursors’ co-expression of rod- and cone-related genes such as NR2E3 and GNAT2, which may explain similar co-expression in RB cells, and iii) the expression of  SYK in early cones and RB cells.  While the earlier version had mainly highlighted point (iii), the revised Discussion further refers to points (i) and (ii) as described further in the response to the Reviewer’s Recommendations for Authors. 

      We address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina by relating the different photoreceptor-related cell populations identified in our study to those characterized by Zuo et al (PMID 39117640), which was specifically highlighted by the reviewer and is especially useful for such cross-validation given the extraordinarily large ~ 220,000 cell dataset covering a wide range of retinal ages (pcw 8–23) and spatiotemporally stratified by macular or peripheral retina location. Relevant analyses of the Zuo et al dataset are shown in Supplementary Figures S3G-H, S10B, S11A-F, and S13A,B. 

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our revision addresses the concerns raised separately in the Reviewer’s Recommendations for Authors, as detailed in the responses below.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have completed their reviews. Generally, they note that your work is important and that the evidence is generally convincing. The reviewers are in general agreement that the paper adds to the field. The findings of rod/cone fate determination at a very early stage are intriguing. Generally, the paper would benefit from clarifications in the writing and figures. Experimentally, the paper would benefit from validation of the drug data, for example using RNAi or another assay. Alternatively, the authors could note the caveats of the drug experiments and describe how they could be improved. In terms of analysis, the paper would be improved by additional comparisons of the authors' data to previously published datasets.

      We thank the reviewing editor for this summary. As described in the individual reviewer responses, we clarify the writing and figures and provide comparisons to previously published datasets (in particular, the large snRNA-seq dataset of Zuo et al., 2024 (PMID 39117640).  With regard to the drug (i.e., SYK inhibitor) studies, we opted to provide caveats and describe the need for genetic approaches to validate the role of SYK, owing to the infeasibility of completing genetic perturbation experiments in the appropriate timeframe.  We are grateful for the opportunity to present our findings with appropriate caveats. 

      Reviewer #1 (Recommendations for the authors):

      Shayler cell sort human progenitor/rod/cone populations then full-length single cell RNAseq to expose features that distinguish paths towards rods or cones. They initially distinguish progenitors (RPCs), immature photoreceptor precursors (iPRPs), long/medium wavelength (LM) cones, late-LM cones, short wavelength (S) cones, early rods (ER) and late rods (LR), which exhibit distinct transcription factor regulons (Figures 1, 2). These data expose expected and novel enriched genes, and support the notion that S cones are a default state lacking expression of rod (NRL) or cone (THRB) determinants but retaining expression of generic photoreceptor drivers (CRX/OTX2/NEUROD1 regulons). They identify changes in regulon activity, such as increasing NRL activity from iPRP to ER to LR, but decreasing from iPRP to cones, or increasing RAX/ISL2/THRB regulon activity from iPRP to LM cones, but decreasing from iPRP to S cones or rods.

      They report co-expression of rod/cone determinants in LM and ER clusters, and the ratios are in the expected directions (NRLTHRB or RXRG in ER). A novel insight from the FL seq is that there are differing variants generated in each cell population. Full-length NRL (FL-NRL) predominates in the rod path, whereas truncated NRL (Tr-NRL) does so in the cone path, then similar (but opposite) findings are presented for THRB (Fig 3, 4), whereas isoforms are not a feature of RXRG expression, just the higher expression in cones.

      The authors then further subcluster and perform RNA velocity to uncover decision points in the tree (Figure 5). They identify two photoreceptor precursor streams, the Transitional Rods (TRs) that provide one source for rod maturation and (reusing the name from the initial clustering) iPRPs that form cones, but also provide a second route to rods. TR cells closest to RPCs (immediately post-mitotic) have higher levels of the rod determinant NR2E3 and NRL, whereas the higher resolution iPRPs near RPCs lack NR2E3 and have higher levels of ONECUT1, THRB, and GNAT2, a cone bias. These distinct rod-biased TR and cone-biased high-resolution iPRPs were not evident in published scRNAseq with 3′ end-counting (i.e. not FL seq). Regulon analysis confirmed higher NRL activity in TR cells, with higher THRB activity in highresolution iPRP cells.

      Many of the more mature high-resolution iPRPs show combinations of rod (GNAT1, NR2E3) and cone (GNAT2, THRB) paths as well as both NRL and THRB regulons, but with a bias towards cone-ness (Figure 6). Combined FISH/immunofluorescence in fetal retina uncovers cone-biased RXRG-protein-high/NR2E3-protein-absent cone-fated cells that nevertheless expressed NR2E3 mRNA. Thus early cone-biased iPRP cells express rod gene mRNA, implying a rod-cone hybrid in early photoreceptor development. The authors refer to these as "bridge region iPRP cells".

      In Figure 7, they identify CHRNA1 as the most specific marker of these bridge cells (overlapping with ATOH7 and DLL3, previously linked to cone-biased precursors), and FISH shows it is expressed in rod-biased NRL protein-positive and cone-biased RXRG proteinpositive cones at fetal week 12.

      Figure 8 outlines the graded expression of various lncRNAs during cone maturation, a novel pattern.

      Finally (Figure 9), the authors identify differential genes expressed in early rods (ER cluster from Figure 1) vs early cones (LM cluster, excluding the most mature opsin+ cells), revealing high levels of MYCN targets in cones. They also find SYK expression in cones. SYK was previously linked to retinoblastoma, so intrinsic expression may predispose cone precursors to transformation upon RB loss. They finish by showing that a SYK inhibitor blocks the proliferation of dividing RB1 knockdown cone precursors in the human fetal retina.

      Overall, the authors have uncovered interesting patterns of biased expression in cone/rod developmental paths, especially relating to the isoform differences for NRL and THRB which add a new layer to our understanding of this fate choice. The analyses also imply that very soon after RPCs exit the cell cycle, they generate post-mitotic precursors biased towards a rod or cone fate, that carry varying proportions of mixed rod/cone determinants and other rod/cone marker genes. They also introduce new markers that may tag key populations of cells that precede the final rod/cone choice (e.g. CHRNA1), catalogue a new lncRNA gradient in cone maturation, and provide insight into potential genes that may contribute to retinoblastoma initiation, like SYK, due to intrinsic expression in cone precursors. However, as detailed below, the text needs to be improved considerably, and overinterpretations need to be moderated, removed, or tested more rigorously with extra data.

      Major Comments

      The manuscript is very difficult to follow. The nomenclature is at times torturous, and the description of hybrid rod/cone hybrid cells is confusing in many aspects.

      (1) A single term, iPRP, is used to refer to an initial low-resolution cluster, and then to a subset of that cluster later in the paper.

      We agree that using immature photoreceptor precursor (iPRP) for both high-resolution and lowresolution clusters was confusing. We kept this name for the low-resolution cluster (which includes both immature cone and immature rod precursors), renamed the high-resolution iPRP cluster immature cone precursors (iCPs). and renamed their transitional rod (TR) counterparts immature rod precursors (iRPs). These designations are based on 

      - the biased expression of THRB, ONECUT1, and the THRB regulon in iCPs (Fig. 5D,E);

      - the biased expression of NRL, NR2E3, and NRL regulon iRPs (Fig. 5D,E);

      - the partially distinct iCP and iRP UMAP positions (Figure 5C); and 

      - the evidence of similar immature cone versus rod precursor populations in the Zuo et al 3’ snRNA-seq dataset, as noted below and described in two new paragraphs starting at the bottom of p. 12.

      (2) To complicate matters further, the reader needs to understand the subset within the iPRP referred to as bridge cells, and we are told at one point that the earliest iPRPs lack NR2E3, then that they later co-express NR2E3, and while the authors may be referring to protein and RNA, it serves to further confuse an already difficult to follow distinction. I had to read and re-read the iPRP data many times, but it never really became totally clear.

      We agree that the description of the high-resolution iPRP (now “iCP”) subsets was unclear, although our further analyses of a large 3’ snRNA-seq dataset in Figure S11 support the impression given in the original manuscript that the earliest iCPs lack NR2E3 and then later coexpress NR2E3 while the earliest iRPs lack THRB and then later express THRB. As described in new text in the Two post-mitotic immature photoreceptor precursor populations section (starting on line 7 of p. 13): 

      When considering only the main cone and rod precursor UMAP regions, early (pcw 8 – 13) cone precursors expressed THRB and lacked NR2E3 (Figure S11D,E, blue arrows), while early (pcw 10 – 15) rod precursors expressed NR2E3 and lacked THRB (Figure S11D,E, red arrows), similar to RPC-localized iCPs and iRPs in our study (Figure 5D).

      Next, as summarized in new text in the Early cone and rod precursors with rod- and conerelated RNA co-expression section (new paragraph at top of p. 16): 

      Thus, a 3’ snRNA-seq analysis confirmed the initial production of immature photoreceptor precursors with either L/M cone-precursor-specific THRB or rod-precursor-specific NR2E3 expression, followed by lower-level co-expression of their counterparts, NR2E3 in cone precursors and THRB in rod precursors. However, in the Zuo et al. analyses, the co-expression was first observed in well-separated UMAP regions, as opposed to a region that bridges the early cone and early rod populations in our UMAP plots. These findings are consistent with the notion that cone- and rod-related RNA co-expression begins in already fate-determined cone and rod precursors, and that such precursors aberrantly intermixed in our UMAP bridge region due to their insufficient representation in our dataset.  

      Importantly, and as noted in our ‘Public response’ to Reviewer 1, “CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that coexpress cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).” In support of this notion, the immature cone precursors expressing CHRNA1  and other  populations did not overlap in UMAP space in the Zuo et al dataset. We hope the new text cited above along with other changes will significantly clarify the observations.

      (3) The term "cone/rod precursor" shows up late in the paper (page 12), but it was clear (was it not?) much earlier in this manuscript that cone and rod genes are co-expressed because of the coexpressed NRL and THRB isoforms in Figures 3/4.

      We thank the reviewer for noting that the differential NRL and THRB isoform expression already implies that cone and rod genes are co-expressed. However, as we now state, the co-expression of RNAs encoding an additional cone marker (GNAT2) and rod markers (GNAT1, NR2E3) was 

      “suggestive of a proposed hybrid cone/rod precursor state more extensive than implied by the coexpression of different THRB and NRL isoforms” (first paragraph of “Early cone and rod …” section on p. 14; new text underlined). 

      (4) The (incorrect) impression given later in the manuscript is that the rod/cone transcript mixture applies to just a subset of the iPRP cells, or maybe just the bridge cells (writing is not clear), but actually, neither of those is correct as the more abundant and more mature LM and ER populations analyzed earlier coexpress NRL and THRB mRNAs (Figures 2, 3). Overall, the authors need to vastly improve the writing, simplify/clarify the nomenclature, and better label figures to match the text and help the reader follow more easily and clearly. As it stands, it is, at best, obtuse, and at worst, totally confusing.

      We thank the reviewer for bringing the extent of the confusing terminology and wording to our attention. We revised the terminology (as in our response to point 1) and extensively revised the text.  We also performed similar analyses of the Zuo et al. data (as described in more detail in our response to Reviewer 2), which clarifies the distinct status of cells with the “rod/cone transcript mixture” and cells co-expressing early cone and rod precursor markers.  

      To more clearly describe data related to cells with rod- and cone-related RNA co-expression, we divided the former Figure 6 into two figures, with Figure 6 now showing the cone- and rodrelated RNA co-expression inferred from scRNA-seq and Figure 7 showing GNAT2 and NR2E3 co-expression in FISH analyses of human retina plus a new schematic in the new panel 7E.

      To separate the conceptually distinct analyses of cone and rod related RNA co-expression and the expression of early photoreceptor precursor markers (which were both found in the so-called bridge region – yet now recognized to be different subpopulations), we separated the analyses of the early photoreceptor precursor markers to form a new section, “Developmental expression of photoreceptor precursor markers and fate determinants,” starting on p. 16. 

      Additionally, we further review the findings and their implications in four revised Discussion paragraphs starting at the bottom of p. 23).

      (5) The data showing that overexpressing Tr-NRL in murine NIH3T3 fibroblasts blocks FL-NRL function is presented at the end of page 7 and in Figure 3G. Subsequent analysis two paragraphs and two figures later (end page 8, Figure 5C + supp figs) reveal that Tr-NRL protein is not detectable in retinoblastoma cells which derive from cone precursors cells and express Tr-NRL mRNA, and the protein is also not detected upon lentiviral expression of Tr-NRL in human fetal retinal explants, suggesting it is unstable or not translated. It would be preferable to have the 3T3 data and retinoblastoma/explant data juxtaposed. E.g. they could present the latter, then show the 3T3 that even if it were expressed (e.g. briefly) it would interfere with FL-NRL. The current order and spacing are somewhat confusing.

      We thank the reviewer for this suggestion and moved the description of the luciferase assays to follow the retinoblastoma and explant data and switched the order of Figure panels 3G and 3H.  

      (6) On page 15, regarding early rod vs early cone gene expression, the authors state: "although MYCN mRNA was not detected....", yet on the volcano plot in Figure S14A MYCN is one of the marked genes that is higher in cones than rods, meaning it was detected, and a couple of sentences later: "Concordantly, the LM cluster had increased MYCN RNA". The text is thus confusing.

      With respect, we note that the original text read, “although MYC RNA was not detected,” which related to a statement in the previous sentence that the gene ontology analysis identified “MYC targets.” However, given that this distinction is subtle and may be difficult for readers to recognize, we revised the text (now on p. 19) to more clearly describe expression of MYCN (but not MYC) as follows:

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also targets of MYCN, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss8–10.  Indeed, whereas MYC RNA was not detected, the LM cone cluster had increased MYCN RNA …”

      (7) The authors state that the SYK drug is "highly specific". They provide no evidence, but no drug is 100% specific, and it is possible that off-target hits are important for the drug phenotype. This data should be removed or validated by co-targeting the SYK gene along with RB1.

      We agree that our data only show the potential for SYK to contribute to the cone proliferative response; however, we believe the inhibitor study retains value in that a negative result (no effect of the SYK inhibitor) would disprove its potential involvement. To reflect this, we changed wording related to this experiment as follows:

      In the Abstract, we changed:

      (1) “SYK, which contributed to the early cone precursors’ proliferative response to RB1 loss” To: “SYK, which was implicated in the early cone precursors’ proliferative response to RB1 loss.”  

      (2) “These findings reveal … and a role for early cone-precursor-intrinsic SYK expression.” To:  “These findings reveal … and suggest a role for early cone-precursor-intrinsic SYK expression.”

      In the last paragraph of the Results, we changed:

      (1) “To determine if SYK contributes…” To:  “To determine if SYK might contribute…”

      (2) “the highly specific SYK inhibitor” To:  “the selective SYK inhibitor”  

      (3)  “indicating that cone precursor intrinsic SYK activity is critical to the proliferative response” To: “consistent with the notion that cone precursor intrinsic SYK activity contributes to the proliferative response.”

      In the Results, we added a final sentence: 

      “However, given potential SYK inhibitor off-target effects, validation of the role of SYK in retinoblastoma initiation will require genetic ablation studies.”

      In the Discussion (2nd-to-last paragraph), we changed: 

      “SYK inhibition impaired pRB-depleted cone precursor cell cycle entry, implying that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation.” To: “…the pRB-depleted cone precursors’ sensitivity to a SYK inhibitor suggests that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation, although genetic ablation of SYK is needed to confirm this notion.” In the Discussion last sentence, we changed:

      “enabled the identification of developmental stage-specific cone precursor features that underlie retinoblastoma predisposition.” To: “enabled the identification of developmental stage-specific cone precursor features that are associated with the cone precursors’ predisposition to form retinoblastoma tumors.”

      Minor/Typos

      Figure 7 legend, H should be D.

      We corrected the figure legend (now related to Figure 8).

      Reviewer #2 (Recommendations for the authors):

      (1) The author should take advantage of recently published human fetal retina data, such as PMID:39117640, which includes a larger dataset of cells that could help validate the findings. Consequently, statements like "To our knowledge, this is the first indication of two immediately post-mitotic photoreceptor precursor populations with cone versus rod-biased gene expression" may need to be revised.

      We thank the reviewer for noting the evidence of distinct immediately post-mitotic rod and cone populations published by others after we submitted our manuscript. In response, we omitted the sentence mentioned and extensively cross-checked our results including:

      - comparison of our early versus late cone and rod maturation states to the cone and rod precursor versus cone and rod states identified by Zuo et al (new paragraph on the top half of p. 6 and new figure panels S3G,H);

      - detection of distinct immediately post-mitotic versus later cone and rod precursor populations (two new paragraphs on pp. 12-13 and new Figures S10B and S11A-E); 

      - identification of cone and rod precursor populations that co-express cone and rod marker genes (two new paragraphs starting at the bottom of p. 15 and new Figures S11D-F);

      - comparison of expression patterns of immature cone precursor (iCP) marker genes in our and the Zuo et al dataset (new paragraph on top half of p. 17 and new Figure S13).

      We also compare the cell states discerned in our study and the Zuo et al. study in a new Discussion paragraph (bottom of p. 23) and new Figure S17.

      (2) The data generated comes from dissociated cells, which inherently lack spatial context. Additionally, it is unclear whether the dataset represents a pool of retinas from multiple developmental stages, and if so, whether the developmental stage is known for each cell profiled. If this information is available, the authors should examine the distribution of developmental stages on the UMAP and trajectory analysis as part of the quality control process. 

      We thank the reviewer for highlighting the importance of spatial context and developmental stage. 

      Related to whether the dataset represents a pool of retinae from multiple developmental stages, the different cell numbers examined at each time point are indicated in Figure S1A. To draw the readers’ attention to this detail, Figure S1A is now cited in the first sentence of the Results. 

      Related to the age-related cell distributions in UMAP plots, the distribution of cells from each retina and age was (and is) shown in Fig. S1F. In addition, we now highlight the age distributions by segregating the FW13, FW15-17, and FW17-18-19 UMAP positions in the new Figure 1C. We describe the rod temporal changes in a new sentence at the top of  p. 5:

      “Few rods were detected at FW13, whereas both early and late rods were detected from FW15-19 (Figure 1C), corroborating prior reports [15,20].”  

      We describe the cone temporal changes and note the likely greater discrimination of cell state changes that would be afforded by separately analyzing macula versus peripheral retina at each age in a new sentence at the bottom of p. 5:

      “L/M cone precursors from different age retinae occupied different UMAP regions, suggesting age-related differences in L/M cone precursor maturation (Figure 1C).”

      Moreover, they should assess whether different developmental stages impact gene expression and isoform ratios. It is well established that cone and rod progenitors typically emerge at different developmental times and in distinct regions of the retina, with minimal physical overlap. Grouping progenitor cells based solely on their UMAP positioning may lead to an oversimplified interpretation of the data.

      (2a) We agree that different developmental stages may impact gene expression and isoform ratios, and evaluated stages primarily based on established Louvain clustering rather than UMAP position. However, we also used UMAP position to segregate so-called RPC-localized and nonRPC-localized iCPs and iRPs, as well as to characterize the bridge region iCP sub-populations. In the revision, we examine whether cell groups defined by UMAP positions helped to identify transcriptomically distinct populations and further examine the spatiotemporal gene expression patterns of the same genes in the Zuo et al. 3’ snRNA-seq dataset. 

      (2b) Related to analyses of immediately post-mitotic iRPs and iCPs, the new Figure S10A expanded the violin plots first shown in Figure 5D to compare gene expression in RPC-localized versus non-RPC-localized iCPs and iRPs and subsequent cone and rod precursor clusters (also presented in response to Reviewer 3). The new Figure S10C, shows a similar analysis of UMAP region-specific regulon activities. These figures support the idea that there are only subtle UMAP region-related differences in the expression of the selected gene and regulons. 

      To further evaluate early cone and rod precursors, we compared expression patterns in our cluster- and UMAP-defined cell groups to those of the spatiotemporally defined cell groups in the Zuo et al. 3’ snRNA-seq study. The results revealed similar expression timing of the genes examined, although the cluster assignments of a subset of cells were brought into question, especially the assigned rod precursors at pcw 10 and 13, as shown in new Figures S10B (grey columns) and S11, and as described in two new paragraphs starting near the bottom of p.12. 

      (2c) Related to analyses of iCPs in the so-called bridge region, our analyses of the Zuo et al dataset helped distinguish early cone and rod precursor populations (expressing early markers such as ATOH7 and CHRNA1) from the later stages exhibiting rod- and cone-related gene coexpression, which had intermixed in the UMAP bridge region in our dataset. Further parsing of early cone precursor marker spatiotemporal expression revealed intriguing differences as now described in the second half of a new paragraph at the top of p. 17, as follows:

      “Also, different iCP markers had different spatiotemporal expression: CHRNA1 and ATOH7 were most prominent in peripheral retina with ATOH7 strongest at pcw 10 and CHRNA1 strongest at pcw 13; CTC-378H22.2 was prominently expressed from pcw 10-13 in both the macula and the periphery; and DLL3 and ONECUT1 showed the earliest, strongest, and broadest expression (Figure S13B). The distinct patterns suggest spatiotemporally distinct roles for these factors in cone precursor differentiation.”

      (3) I would commend the authors for performing a validation experiment via RNA in situ to validate some of the findings. However, drawing conclusions from analyzing a small number of cells can still be dangerous. Furthermore, it is not entirely clear how the subclustering is done. Some cells change cell type identities in the high-resolution plot. For example, some iPRP cells from the low-resolution plots in Figure 1 are assigned as TR in high-resolution plots in Figure 5.

      The authors should provide justification on the identifies of RPC localized iPRP and TR.

      Comparison of their data with other publicly available data should strengthen their annotation

      We agree that drawing conclusions from scRNA-seq or in situ hybridization analysis of a small number of cells can be dangerous and have followed the reviewer’s suggestion to compare our data with other publicly available data, focusing on the 3’ snRNA-seq of Zuo et al. given its large size and extensive annotation. Our analysis of  the Zuo et al. dataset helped clarify cell identities by segregating cone and rod precursors with similar gene expression properties in distinct UMAP regions. However, we noted that the clustering of early cone and rod precursors likely gave numerous mis-assigned cells (as noted in response 2b above and shown in the new Figure S11). It would appear that insights may be derived from the combination of relatively shallow sequencing of a high number of cells and deep sequencing of substantially fewer cells. 

      Related to how subclustering was done, the Methods state, “A nearest-neighbors graph was constructed from the PCA embedding and clusters were identified using a Louvain algorithm at low and high resolutions (0.4 and 1.6)[70],” citing the Blondel et al reference for the Louvain clustering algorithm used in the Seurat package.  To clarify this, the results text was revised such that it now indicates the levels used to cluster at low resolution (0.4, p. 4, 2nd paragraph) and at high resolution (1.6, top of p. 11) .

      Related to the assignment of some iPRP cells from the low-resolution plots in Figure 1 to the TR cluster (now called the ‘iRP’ ‘cluster) in the high-resolution plots in Figure 5, we suggest that this is consistent with Louvain clustering, which does not follow a single dendrogram hierarchy. 

      The justification for referring to these groups as RPC-localized iCPs and iRPs relates to their biased gene and regulon expression in Fig. 5D and 5E, as stated on p. 12: 

      “In the RPC-localized region, iCPs had higher ONECUT1, THRB, and GNAT2, whereas iRPs trended towards higher NRL and NR2E3 (p= 0.19, p=0.054, respectively).”

      (4) Late-stage LM5 cluster Figure 9 is not defined anywhere in previous figures, in which LM clusters only range from 1 to 4. The inconsistency in cluster identification should be addressed.

      We revised the text related to this as follows: 

      “Indeed, our scRNA-seq analyses revealed that SYK RNA expression increased from the iCP stage through cluster LM4, in contrast to its minimal expression in rods (Figure 10E).  Moreover, SYK expression was abolished in the five-cell group with properties of late maturing cones (characterized in Figure 1E), here displayed separately from the other LM4 cells and designated LM5 (Figure 10E).”  (p. 19-20)

      (5) Syk inhibitor has been shown to be involved in RB cell survival in previous studies. The manuscript seems to abruptly make the connection between the single-cell data to RB in the last figure. The title and abstract should not distract from the bulk of the manuscript focusing on the rod and cone development, or the manuscript should make more connection to retinoblastoma.

      We appreciate the reviewer’s concern that the title may seem to over-emphasize the connection to retinoblastoma based solely on the SYK inhibitor studies. However, we suggest the title also emphasizes the identification and characterization of early human photoreceptor states, per se, and that there are a number of important connections beyond the SYK studies that could warrant the mention of cell-state-specific retinoblastoma-related features in the title.

      Most importantly, a prior concern with the cone cell-of-origin theory was that retinoblastoma cells express RNAs thought to mark retinal cell types other than cones, especially rods. The evidence presented here, that cone precursors also express the rod-related genes helps resolve this issue. The issue is noted numerous times in the manuscript, as follows:  

      In the Introduction, we write:

      “However, retinoblastoma cells also express rod lineage factor NRL RNAs, which – along with other evidence – suggested a heretofore unexplained connection between rod gene expression and retinoblastoma development[12,13]. Improved discrimination of early photoreceptor states is needed to determine if co-expression of rod- and cone-related genes is adopted during tumorigenesis or reflects the co-expression of such genes in the retinoblastoma cell of origin.” (bottom, p. 2) And: 

      “In this study, we sought to further define the transcriptomic underpinnings of human  photoreceptor development and their relationship to retinoblastoma tumorigenesis.” (last paragraph, p. 3)

      The Discussion also alluded to this issue and in the revised Discussion, we aimed to make the connection clearer.  We previously ended the 3rd-to-last paragraph with,  

      “iPRP [now iCP] and early LM cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin.” 

      We now separate and elaborate on this point in a new paragraph as follows: 

      “Our characterization of cone and rod-related RNA co-expression may help resolve questions about the retinoblastoma cell of origin. Past studies suggested that retinoblastoma cells co-express RNAs associated with rods, cones, or other retinal cells due to a loss of lineage fidelity[12]. However, the early L/M cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin. This idea is further supported by the retinoblastoma cells’ preferential expression of cone-enriched NRL transcript isoforms (Figure S5B).” (middle of p. 24) Based on the above, we elected to retain the title.  

      Minor comments:

      (1) It is difficult to see the orange and magenta colors in the Fig 3E RNA-FISH image. The colors should be changed, or the contrast threshold needs to be adjusted to make the puncta stand out more.

      We re-assigned colors, with red for FL-NRL puncta and green for Tr-NRL puncta. 

      (2) Figure 5C on page 8 should be corrected to Supplementary Figure 5C.

      We thank the reviewer for noting this error and changed the figure citation.

      Reviewer #3 (Recommendations for the authors):

      (1) Minor concerns

      a. Abbreviation of some words needs to be included, example: FW. 

      We now provide abbreviation definitions for FW and others throughout the manuscript.  

      b. Cat # does not matches with the 'key resource table' for many reagents/kits. Some examples are: CD133-PE mentioned on Page # 22 on # 71, SMART-Seq V4 Ultra Low Input RNA Kit and SMARTer Ultra Low RNA Kit for the Fluidigm C1 Sytem on Page # 22 on # 77, Nextera XT DNA Library preparation kit on Page # 23 on # 77.

      We thank the reviewer for noting these discrepancies. We have now checked all catalog numbers and made corrections as needed.

      c. Cat # and brand name of few reagents & kits is missing and not mentioned either in methods or in key resource table or both. Eg: FBS, Insulin, Glutamine, Penicillin, Streptomycin, HBSS, Quant-iT PicoGreen dsDNA assay, Nextera XT DNA LibraryPreparation Kit, 5' PCR Primer II A with CloneAmp HiFi PCR Premix. 

      Catalog numbers and brand names are now provided for the tissue culture and related reagents within the methods text and for kits in the Key Resources Table. Additional descriptions of the primers used for re-amplification and RACE were added to the Methods (p. 28-29).

      d. Spell and grammar check is needed throughout the manuscript is needed. Example. In Page # 46 RXRγlo is misspelled as RXRlo.

      Spelling and grammar checks were reviewed.

      (2) Methods & Key Resource table.

      a. In Page # 21, IRB# needs to be stated.      

      The IRB protocols have been added, now at top of p. 26.

      b. In Page # 21, Did the authors dissociate retinae in ice-cold phosphate-buffered saline or papain?   

      The relevant sentence was corrected to “dissected while submerged in ice-cold phosphatebuffered saline (PBS) and dissociated as described10.” ( p. 26)

      c. In Page # 21, How did the authors count or enumerate the cell count? Provide the details.

      We now state, “… a 10 µl volume was combined with 10 µl trypan blue and counted using a hemocytometer” (top of p. 27)

      d. Why did the authors choose to specifically use only 8 cells for cDNA preparation in Page # 22? State the reason and provide the details.

      The reasons for using 8 cells (to prevent evaporation and to manually transfer one slide-worth of droplets to one strip of PCR tubes) and additional single cell collection details are now provided as follows (new text underlined): 

      “Single cells were sorted on a BD FACSAria I at 4°C using 100 µm nozzle in single-cell mode into each of eight 1.2 µl lysis buffer droplets on parafilm-covered glass slides, with droplets positioned over pre-defined marks … .  Upon collection of eight cells per slide, droplets were transferred to individual low-retention PCR tubes (eight tubes per strip) (Bioplastics K69901, B57801) pre-cooled on ice to minimize evaporation. The process was repeated with a fresh piece of parafilm for up to 12 rounds to collect 96 cells). (p. 27, new text underlined)

      e. Key resource table does not include several resources used in this study. Example - NR2E3 antibody.

      We added the NR2E3 antibody and checked for other omissions.

      (3) Results & Figures & Figure Legends

      a. Regulon-defined RPC and photoreceptor precursor states

      i. On page # 4, 1 paragraph - Clarify the sentence 'Exclusion of all cells with <100,000 cells read and 18 cells.........Emsembl transcripts inferred'. Did the authors use 18 cells or 18FW retinae? 

      The sentence was changed to:

      “After sequencing, we excluded all cells with <100,000 read counts and 18 cells expressing one or more markers of retinal ganglion, amacrine, and/or horizontal cells (POU4F1, POU4F2, POU4F3, TFAP2A, TFAP2B, ISL1) and concurrently lacking photoreceptor lineage marker OTX2. This yielded 794 single cells with averages of 3,750,417 uniquely aligned reads, 8,278 genes detected, and 20,343 Ensembl transcripts inferred (Figure S1A-C).” (p. 4, new words underlined)

      To clarify that 18 retinae were used, the first sentence of the Results was revised as follows:

      “To interrogate transcriptomic changes during human photoreceptor development, dissociated RPCs and photoreceptor precursors were FACS-enriched from 18 retinae, ages FW13-19 …” (p. 4).

      Why did the authors 'exclude cells lacking photoreceptor lineage marker OTX2' from analysis especially when the purpose here was to choose photoreceptor precursor states & further results in the next paragraph clearly state that 5 clusters were comprised of cells with OTX2 and CRX expression. This is confusing.

      We apologize for the imprecise diction. We divided the evidently confusing sentence into two sentences to more clearly indicate that we removed cells that did not express OTX2, as in the first response to the previous question.

      ii. In Page # 5, the authors reported the number of cell populations (363 large and 5 distal) identified in the THRB+ L/M-cone cluster. What were the # of cell populations identified in the remaining 5 clusters of the UMAP space?

      We added the cell numbers in each group to Fig. 1B. We corrected the large LM group to 366 cells (p. 5) and note 371 LM cells , which includes the five distal cells, in Figure 1B.

      b. Differential expression of NRL and THRB isoforms in rod and cone precursors

      i. In Figure 3B, the authors compare and show the presence of 5 different NRL isoforms for all the 6 clusters that were defined in 3A. However, in the results, the ENST# of just 2 highly assigned transcript isoforms is given. What are the annotated names of the three other isoforms which are shown in 3B? Please explain in the Results.

      As requested, we now annotate the remaining isoforms as encoding full-length or truncated NRL in Fig. 3B and show isoform structures in new Supplementary Figure S4B.  We also refer to each transcript isoform in the Results (p. 7, last paragraph) and similarly evaluate all isoforms in RB31 cells (Fig. S5B).

      ii. What does the Mean FPM in the y-axis of Fig 3C refer to?

      Mean FPM represents mean read counts (fragments per million, FPM) for each position across Ensembl NRL exons for each cluster, as now stated in the 6th line of the Fig. 3 legend.

      iii. A clear explanation of the results for Figures 3E-3F is missing.

      We revised the text to more clearly describe the experiment as follows:

      “The cone cells’ higher proportional expression of Tr-NRL first exon sequences was validated by RNA fluorescence in situ hybridization (FISH) of FW16 fetal retina in which NRL immunofluorescence was used to identify rod precursors, RXRg immunofluorescence was used to identify cone precursors, and FISH probes specific to truncated Tr-NRL exon 1T or FL-NRL exons 1 and 2 were used to assess Tr-NRL and FL-NRL expression (Figure 3E,F).” (p. 8, new text underlined).

      c. Two post-mitotic photoreceptor precursor populations

      i. Although deep-sequencing and SCENIC analysis clarified the identities of four RPC-localized clusters as MG, RPC, and iPRP indicative of cone-bias and TR indicative of rod-bias. It would be interesting to see the discriminating determinant between the TR and ER by SCENIC and deep-sequencing gene expression violin/box plots.

      We agree it is of interest to see the discriminating determinant between the TR [now termed iRP] and ER clusters by SCENIC and deep-sequencing gene expression violin/box plots. We now provide this information for selected genes and regulons of interest in the new Supplementary Figures S10A and S10C, along with a similar comparison between the prior high-resolution iPRP (now termed iCP) cluster and the first high-resolution LM cluster, LM1, as described for gene expression on p. 12:

      “Notably, THRB and GNAT2 expression did not significantly change while ONECUT1 declined in the subsequent non-RPC-localized iCP and LM1 stages, whereas NR2E3 and NRL dramatically increased on transitioning to the ER state (Figure S10A).”

      And as described for regulon activities on pp. 13-14:

      “Finally, activities of the cone-specific THRB and ISL2 regulons, the rod-specific NRL regulon, and the pan-photoreceptor LHX3, OTX2, CRX, and NEUROD1 regulons increased to varying extents on transitioning from the immature iCP or iRP states to the early-maturing LM1 or ER states (Figure 10C).”

      We also show expression of the same genes for spatiotemporally grouped cells from the Zuo et al. dataset in the new Figure S10B, which displays a similar pattern (apart from the possibly mixed pcw 10 and pcw13 designated rod precursors).

      d. Early cone precursors with cone- and rod-related RNA expression

      i. On page #12, the last paragraph where the authors explain the multiplex RNA FISH results of RXRγ and NR2E3 by citing Figure S8E. However, in Fig S8E, the authors used NRL to identify the rods. Please clarify which one of the rod markers was used to perform RNA FISH?

      Figure S8E (where NRL was used as a rod marker) was cited to remind readers that RXRg has low expression in rods and high expression in cones, rather than to describe the results of this multiplex FISH section. To avoid confusion on this point, Figure S8E is now cited using “(as earlier shown in Figure S8E).” With this issue clarified, we expect the markers used in the FISH + IF analysis will be clear from the revised explanation, 

      “… we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15).

      To provide further clarity, we provide a diagram of the FISH probes, protein markers, and expression patterns in the new Figure 7E.

      ii. The Y-axis of Fig 6G-6H needs to be labelled.

      The axes have been re-labeled from “Nb of cells” to “Number of RXRg+ outermost NBL cells in each region” (original Fig. 6G, now Fig. 7C) and “Number of RXRg+ middle NBL cells in each region” (original Fig. 6H, now Fig. 7D).

      iii. The legends of Figures 6G and 6H are unclear. In the Figure 6G legend, the authors indicate 'all cells are NR2E3 protein-'. Does that imply the yellow and green bars alone? Similarly, clarify the Figure 6H legend, what does the dark and light magenta refer to? What does the light magenta color referring to NR2E3+/ NR2E3- and the dark magenta color referring to NR2E3+/ NR2E3+ indicate? 

      We regret the insufficient clarity. We revised the Fig. 6G (now Fig. 7C) key, which now reads

      “All outermost NBL cells are NR2E3 protein-negative.”  We added to the figure legend for panel 7C,D “(n.b., italics are used for RNAs, non-italics for proteins).”  The new scheme in Figure 7E shows the RNAs in italics proteins in non-italics. We hope these changes will clarify when RNA or protein are represented in each histogram category.

      Overall, the results (on page # 13) reflecting Figures 6E-6H & Figure S11 are confusing and difficult to understand. Clear descriptions and explanations are needed.

      We revised this results section described in the paragraph now spanning p. 14:

      -  We now refer to the bar colors in Figures 7C and 7D that support each statement. 

      -  We provide an illustration of the findings in Figure 7E.

      iv. Previously published literature has shown that cells of the inner NBL are RXRγ+ ganglion cells. So, how were these RXRγ+ ganglion cells in the inner NBL discriminated during multiplex RNA FISH (in Fig 6E-6H and in Fig S11)?

      We thank the reviewer for requesting this clarification. We agree that “inner NBL” is the incorrect term for the region in which we examined RXRg+ photoreceptor precursors, as this could include RXRγ+ nascent RGCs. We now clarify that 

      “we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .”  (p. 14-15) We further state, 

      “Limiting our analysis to the outer and middle NBL allowed us to disregard RXRγ+ retinal ganglion cells in the retinal ganglion cell layer or inner NBL (top of p. 15)”

      Figure 7E is provided to further aid the reader in understanding the positions examined, and the legend states “RXRg+ retinal ganglion cells in the inner NBL and ganglion cell layer not shown. 

      v. In Figure 6E, what marker does each color cell correspond to?

      In this figure (now panel 7A), we declined to provide the color key since the image is not sufficiently enlarged to visualize the IF and FISH signals. The figure is provided solely to document the regions analyzed and readers are now referred to “see Figure S12 for IF + FISH images” (2nd line, p. 15), where the marker colors are indicated.

      vi. In Figure S11 & 6E, Protein and RNA transcript color of NR2E3, GNAT2 are hard to distinguish. Usage of other colors is recommended.  

      We appreciate the reviewer’s concern related to the colors (in the now redesignated Figure S12 and 7A); however, we feel this issue is largely mitigated by our use of arrows to point to the cells needed to illustrate the proposed concepts in Figure S12B. All quantitation was performed by examining each color channel separately to ensure correct attribution, which is now mentioned in the Methods (2nd-to-last line of Quantitation of FISH section, p. 35).

      vii. 

      With due respect, we suggest that labeling each box (now in Figure 8B) makes the figure rather busy and difficult to infer the main point, which is that boxed regions were examined at various distanced from the center (denoted by the “C” and “0 mm”) with distances periodically indicated. We suggest the addition of such markers would not improve and might worsen the figure for most readers.    

      e. An early L/M cone trajectory marked by successive lncRNA expression

      i. In Figure 8C - color-coded labelling of LM1-4 clusters is recommended.

      We note Fig. 8C (now 9C) is intended to use color to display the pseudotemporal positions of each cell. We recognize that an additional plot with the pseudotime line imposed on LM subcluster colors could provide some insights, yet we are unaware of available software for this and are unable to develop such software at present. To enable readers to obtain a visual impression of the pseudotime vs subcluster positions, we now refer the reader to Figure 5A in the revised figure legend, as follows:  (“The pseudotime trajectory may be related to LM1-LM4 subcluster distributions in Figure 5A.”).

      ii. In Figure 8G - what does the horizontal color-coded bar below the lncRNAs name refer to? These bars are similar in all four graphs of the 8G figure.

      As stated in the Fig. 8G (now 9G) legend, “Colored bars mark lncRNA expression regions as described in the text.”  We revised the text to more clearly identify the color code. (p. 18-19)   

      f. Cone intrinsic SYK contributions to the proliferative response to pRB loss

      i. In Fig 9F - The expression of ARR3+ cells (indicated by the green arrow in FW18) is poorly or rarely seen in the peripheral retina.

      We thank the reviewer for finding this oversight. In panel 9F (now 10F), we removed the green arrows from the cells in the periphery, which are ARR3- due to the immaturity of cones in this region. 

      ii. In Figure 9F - Did the authors stain the FW16 retina with ARR3?

      Unfortunately, we did not stain the FW16 retina for ARR3 in this instance.

      iii. Inclusion of DAPI staining for Fig 9F is recommended to justify the ONL & INL in the images.

      We regret that we are unable to merge the DAPI in this instance due to the way in which the original staining was imaged.  A more detailed analysis corroborating and extending the current results is in progress. 

      iv. Immunostaining images for Figure 9G are missing & are required to be included. What does shSCR in Fig 9G refer to?

      We now provide representative immunostaining images below the panel (now 10G). The legend was updated: “Bottom: Example of Ki67, YFP, and RXRg co-immunostaining with DAPI+ nuclei (yellow outlines). Arrows: Ki67+, YFP+, RXRg+ nuclei.”  The revised legend now notes that shSCR refers to the scrambled control shRNA.

      v. For Figure 9H - Is the presence and loss of SYK activity consistent with all the subpopulations (S & LM) of early maturing and matured cones?

      We appreciate the reviewer’s question and interest (relating to the redesignated Figure 10H); however, we have not yet completed a comprehensive evaluation of SYK expression in all the subpopulations (S & LM) of early maturing and matured cones and will reserve such data for a subsequent study. We suggest that this information is not critical to the study’s major conclusions.

      vi. Figure 9A is not explained in the results. Why were MYCN proteins assessed along with ARR3 and NRL? What does this imply?

      We thank the reviewer for noting that this figure (now Figure 10A) was not clearly described. 

      As per the response to Reviewer 1, point 6 , the text now states,  

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also MYCN targets, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss [8–10].” (middle, p. 19, new text underlined).

      Hence, the figure demonstrates the cone cell specificity of high MYCN protein.  This is further noted in the Fig. 10a legend: “A. Immunofluorescent staining shows high MYCN in ARR3+ cones but not in NRL+ rods in FW18 retina.”

    1. Author response:

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatialnumerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an LR bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc. 

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.

      Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.

      We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.

      Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.

      Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.

      Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches. 

      Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.

      Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks. 

      Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.

      Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work. 

      Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.

      Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.

      We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.

      Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.

      Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially  invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.

      Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      Still a number of doubts with regard to some of the results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      Thank you for the opportunity to review a revised version.

      I still have serious doubts with regard to a number of datasets presented. For example, the results on essential hypertension and cervical cancer show very small effect sizes, but according to the authors still reach the level of statistical significance. This is unlikely to be accurate. For MR analyses, this is nearly impossible. The analyses of these data and the statistical analysis need to be checked for errors and repeated. While BOLT-LLM might not be relevant here, there might be other things happening here. The authors should therefore always interpret the results also with regard to the observed effect sizes instead of only looking at the p-values (0.999 means that there is a 0.1% lower risk).

      Thank you for your suggestions. We have updated the results for essential hypertension, GAD, and cervical cancer in results, figures, and supplemental tables (lines 65-89, Figure 1, Tables S3-S4).

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth may have a positive effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      The authors addressed the remarks on the previous version very well. Addressing the two points below would further increase the quality of the manuscript.

      (1) In the previous version the authors mentioned that their results are also consistent with the disposable soma theory: "These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism's investment in reproduction and somatic maintenance and repair."

      Although the antagonistic pleiotropy and disposable soma theories describe different mechanisms, both provide frameworks for understanding how genes linked to fertility influence health. The antagonistic pleiotropy theory posits that genes enhancing fertility early in life may have detrimental effects later. In contrast, the disposable soma theory suggests that energy allocation involves a trade-off, where investment in fertility comes at the expense of somatic maintenance, potentially leading to poorer health in later life.

      To strengthen the manuscript, a discussion section should be added to clarify the overlap and distinctions between these two evolutionary theories and suggest directions for future research in disentangling their specific mechanisms.

      Thank you for your suggestions to clarify the overlap and distinctions between the antagonistic pleiotropy and disposable soma theories. While our primary focus is on the antagonistic pleiotropy framework, we acknowledge that the disposable soma theory also provides a relevant perspective on the trade-offs between reproduction and somatic maintenance.

      To address this, we have expanded the discussion section to highlight how both theories contribute to our understanding of the relationship between fertility-related traits and aging-related health outcomes. We also suggested potential future research directions, such as integrating genetic data with biomarkers of somatic to further explore the mechanisms underlying these trade-offs (lines 213-223).

      (2) In response to the question why the authors did not include age at menopause in addition to the already included age at first child and age at menarche the following explanation was provided: "Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research."

      It remains, however, unclear why genes beneficial for early survival and reproduction would be reflected only in age at menarche and age at first childbirth, but not in age at menopause. While age at menarche marks the onset of fertility, age at menopause signifies its end. Since evolutionary selection acts directly until reproduction is no longer possible (though indirect evolutionary pressures persist beyond this point), the inclusion of additional fertility-related measures could have strengthened the analysis. A more detailed justification for focusing exclusively on age at menarche and first childbirth would enhance the clarity and rigor of the manuscript.

      Thank you for your question regarding the age at menopause in our analysis. Our decision was based on the theoretical framework of antagonistic pleiotropy, which emphasizes early-life reproductive advantages that may have trade-offs later in life. Age at menarche and age at first childbirth are direct markers of early reproductive investment, which align closely with this framework.

      While age at menopause marks the cessation of reproductive capability, its evolutionary role is distinct. The selective pressures acting on menopause are complex and may involve post-reproductive contributions rather than direct reproductive fitness benefits. Moreover, the genetic architecture of menopause may be influenced by different biological pathways compared to early reproductive traits.

      Nonetheless, we acknowledge that including age at menopause could provide additional insights into reproductive aging. Several papers1,2 were already published regarding age at menopause and age-related outcomes, including diabetes, AD, osteoporosis, cancers, and cardiovascular diseases.

      Reviewing Editor (Recommendations for the authors):

      Above/below you will find the remaining comments from the reviewers. One of the main issues remaining is that some of the data seems to be incorrectly analysed and some of the findings may not be correct. To clarify this a lot more, I asked the reviewer for some details and received the following:

      - In Figure 1B one of their main outcomes is "age of menopause", but they report the data as an odds ratio. This is not correct and should be fixed (it seems the authors can run the right analysis, but just reported it with the wrong heading in the figure). This likely also applies to the outcome "facial aging". Also the heading in Figure 1A should be Beta instead of OR.

      We have updated the figures to ensure that the beta values of continuous outcomes and odds ratio values of categorical outcomes are presented in Figure 1.

      - With essential hypertension, GAD and cervical cancer, the estimates are so small that they need to re-review their results. The current MR analysis is not sufficiently powered to have such small confidence intervals. Essential hypertension was based on data from UK biobank, although I was also unable to find what program was used to generate the GWAS results, I have strong thoughts this was also BOLT-LLM. Same for cervical cancer. Both datasets used familial-related samples, so they are very likely derived with BOLT-LLM.

      I hope this will help to solve this issue.

      Based on published paper, gastrointestinal or abdominal disease (GAD) (GWAS ID: ebi-a-GCST90038597) is after BOLT-LLM. Based on MRC IEU UK Biobank GWAS pipeline, version 1 and 2, essential hypertension (GWAS ID: ukb-b-12493) and cervical cancer (GWAS ID: ukb-b-8777) are after BOLT-LLM. We have updated the MR analysis results and figures (lines 65-89, Figure 1, Tables S3-S4) as well as the following IPA analysis (lines 106-162 and 255-280, Figures 2-3).

      (1) Magnus, M. C., Borges, M. C., Fraser, A. & Lawlor, D. A. Identifying potential causal effects of age at menopause: a Mendelian randomization phenome-wide association study. Eur J Epidemiol 37, 971-982 (2022). https://doi.org:10.1007/s10654-022-00903-3

      (2) Zhang, X., Huangfu, Z. & Wang, S. Review of mendelian randomization studies on age at natural menopause. Front Endocrinol (Lausanne) 14, 1234324 (2023). https://doi.org:10.3389/fendo.2023.1234324

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through βarrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning. 

      We thank the reviewer for the comments. We have provided point by point response to the reviewer’s comment below and incorporated the suggestions in our revised manuscript. Modified parts of manuscripts are highlighted in yellow.   

      Comments:

      (1) The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are. 

      We thank the reviewer to point this out. We have added the color scheme in the figure caption. 

      (2) For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.  

      We thank the reviewer for the suggestion. We agree with the reviewer that gaussian height/width may impact unbinding pathway. However, we like to point out that we used a well-tempered version of the metadynamics. In well-tempered metadynamics, the effective gaussian height decreases as bias deposition progresses. Therefore, we believe that the gaussian height/width should have minimal impact on the unbinding pathway. To address the reviewer's suggestion, we conducted additional well-tempered metadynamics simulations varying key parameters such as bias height, bias factor, and the deposition rate, all of which can influence the sampling space. Parameter values for bias height, bias factor and deposition rate that we originally used in the paper are 0.4 kcal/mol, 15 and 1/5 ps<sup>-1</sup>, respectively. We explored different values for these parameters and projected the sampled space on top of previously sampled region (Figure S4). We observed that new simulations sample similar unbinding pathway in the extracellular direction and discover similar space in the binding pocket as well. 

      Results and Discussion (Page 10)

      “We also performed unbinding simulations using well-tempered metadynamics parameters (bias height, bias deposition rate and bias factor) to confirm the existence of alternative pathways (Figure S4). However, the simulations show that ligands follow the similar pathway for all

      metadynamics runs.”

      (3) It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022). 

      We appreciate the reviewer's feedback. We have incorporated additional citations of studies demonstrating the use of TRAM as an estimator for both kinetics and thermodynamics (e.g. Ligand binding: Ge, Y. and Voelz, V.A., JCP, 2022[1]; Peptide-protein binding kinetics: Paul, F. et al., Nat. Commun., 2017[2], Ge, Y. et al., JCIM, 2021[3]). Additionally, we have included references to studies where biased simulations were initially used to explore the conformational space, and the results were then employed to seed unbiased simulations for building a Markov state model. (Metadynamics: Sun, X. et al., elife, 2018[4]; Umbrella Sampling: Abella, J. R. et al., PNAS, 2020[5]; Replica Exchange: Paul, F. et al., Nat. Commun., 2017[2]).

      (4) What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared. 

      We apologize for this confusion. The KL divergence analysis was performed on the probability distributions of the inverse distances between residue pairs from any two macrostates. Each macrostate was represented by 1000 frames that were selected proportional to the TRAM stationary density. All possible pair-wise inverse distances were calculated per frame for the purpose of these calculations. Although KL divergence is inherently asymmetric, we symmetrized the measurement by calculating the average. Per-residue K-L divergence, which is shown in the main figures as color and thickness gradient, was calculated by taking the sum of all pairs corresponding to the residue. We have included a detailed discussion of K-L divergence in Methods section.  We have also modified the result section to add a brief discussion of K-L divergence methodology.

      Results and Discussion (Page 15)

      “We further performed Kullback-Leibler divergence (K-L divergence) analysis between inverse distance of residue pairs of two macrostates to highlight the protein region that undergoes high conformational change with ligand movement.”

      Methods (Page 33)

      “Kullback–Leibler divergence (K-L divergence) analysis was performed to show the structural differences in protein conformations in different macrostates[4,114] . In this study, this technique was used to calculate the difference in the pairwise inverse distance distributions between macrostates. Each macrostate was represented by 1000 frames that were selected proportional to their TRAM weighted probabilities. Although K-L divergence is an asymmetric measurement, for this study, we used a symmetric version of the K-L divergence by taking the average between two macrostates. Per residue contribution of K-L divergence was calculated by taking the sum of all the pairwise distances corresponding to that residue. This analysis was performed by inhouse Python code.”  

      (5) I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class. 

      We thank the reviewer for the suggestion. In response, we have revised the manuscript to ensure that the language reflects that our findings are based on observations from a limited set of ligands, namely one NPS and one classical cannabinoid. We have replaced references to ligand groups (such as NPS or classical cannabinoid) with the specific ligand names (such as MDMB-FUBINACA or HU-210) to avoid claims of universality and prevent any potential confusion.

      Results and Discussion (Page 19)

      “In this work, we trained the network with the NPS (MDMB-FUBINACA), and classical cannabinoid (HU-210) bound unbiased trajectories (Method Section). Here, we compared the allosteric interaction weights between the binding pocket and the NPxxY motif which involves in triad interaction formation. Results show that each binding pocket residue in MDMBFUBINACA bound ensemble shows higher allosteric weights with the NPxxY motif, indicating larger dynamic interactions between the NPxxY motif and binding pocket residues(Figure S9).  The probability of triad formation was estimated to observe the effect of the difference in allosteric control. TRAM weighted probability calculation showed that MDMB-FUBINACA bound CB1 has the higher probability of triad formation (Figure 8A). Comparison of the pairwise interaction of the triad residues shows that interaction between Y397<sup>7.53</sup>-T210<sup>3.46</sup> is relatively more stable in case of MDMB-FUBINACA bound CB1, while other two inter- actions have similar behavior for both systems (Figures S10A, S10B, and S10C). Therefore, higher interaction between Y397<sup>7.53</sup> and T210<sup>3.46</sup> in MDMB-FUBINACA bound receptor causes the triad interaction to be more probable. 

      Furthermore, we also compared TM6 movement for both ligand bound ensemble which is another activation metric involved in both G-protein and β-arrestin binding. Comparison of TM6 distance from the DRY motif of TM3 shows similar distribution for HU-210 and MDMBFUBINACA (Figure 8B). These observations support that NPS binding causes higher β-arrestin signaling by allosterically controlling triad interaction formation.” 

      Reviewer #2 (Public Review): 

      Summary: 

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics. 

      Strengths: 

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out. 

      Weaknesses: 

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case. 

      We thank the reviewer for the comment. While we agree that the thermodynamic comparisons between MSM and TRAM provide similar values in this instance, we would like to emphasize the underlying reasoning behind our choice of TRAM.

      MSM can struggle to accurately estimate thermodynamic and kinetic properties in cases where local state reversibility (detailed balance) is not easily achieved with unbiased sampling. This is especially relevant in ligand unbinding processes, which often involve overcoming high free energy barriers. TRAM, by incorporating biased simulation data (such as umbrella sampling) in addition to unbiased data, can better achieve local reversibility and provide more robust estimates when unbiased sampling is insufficient.

      The similarity in thermodynamic estimates between MSM and TRAM in our study can be attributed to the relatively long unbiased sampling period (> 100 µs) employed. With sufficient sampling, MSM can approach detailed balance, leading to results comparable to those from TRAM. However, as we demonstrated in our manuscript (Figure 4D), when the amount of unbiased sampling is reduced, the uncertainties in both the thermodynamics and kinetics estimates increase significantly for MSM compared to TRAM. Thus, while MSM and TRAM perform similarly under the conditions of extensive sampling, TRAM's advantage lies in its robustness when unbiased sampling is limited or difficult to achieve. 

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues. 

      We thank the reviewer for the comment. We acknowledge that biased simulations could potentially introduce hysteresis or result in the identification of unphysical pathways. However, we believe this issue is mitigated using well-tempered metadynamics, which gradually deposit a decaying bias. This approach enables the simulation to explore orthogonal directions of collective variable (CV) space, reducing the likelihood of hysteresis effects(Invernizzi, M. and Parrinello, M., JCTC, 2019[6]).

      Furthermore, there is precedent for using metadynamics-derived pathways to initiate unbiased simulations for constructing Markov State Models (MSMs). This methodology has been successfully applied in studying G-protein activation (Sun, X. et al., elife, 2018[4]).

      Additional support to our observation can be found in two independent binding/unbinding studies of ligands from cannabinoid receptors, which have discovered similar pathway using different CVs (Saleh, et al., Angew. Chem., 2018[7]; Hua, T. et al., Cell, 2020[8]).   

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report 

      We would like to address the reviewer's concerns regarding the choice of ligands, lack of direct experimental comparison, and the use of TRAM, and clarify our rationale point by point:

      Ligand Choice: The ligands selected for this study were chosen due to their relevance and well characterized binding properties. MDMB-FUBINACA is well-known NPS ligand with documented binding properties. This ligand is still the only NPS ligand with experimentally determined CB1 bound structure (Krishna Kumar, K. et al., Cell, 2019[9]). Similarly, the classical cannabinoid (HU-210) used in this study has established binding characteristics and is one of earliest known synthetic classical cannabinoid. Therefore, these ligands serve as representative compounds within their respective categories, making them suitable for our comparative analysis.

      Experimental Comparison: We have indeed compared our simulation results to experimental data, particularly focusing on binding free energies. In the result section, we have shown that the relative binding free energy estimated from our simulation aligns closely with the experimentally measured values. Additionally, Absolute binding energy estimates are also within ~3 kcal/mol of the experimentally predicted value.

      TRAM Performance: TRAM estimated free energies, and rates have been benchmarked against experimental predictions for various studies along with our study (Peptide-protein binding: Paul, F. et al., Nat. Commun., 2017[2]; Ligand unbinding: Wu, H. et al., PNAS, 2016[10]) . As the primary goal of this study is to compare ligand unbinding mechanism, we believe benchmarking against other datasets, such as the D.E. Shaw GPCR/ligand binding paper, is not essential for this work.

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures. 

      We thank the reviewer for the comment. We would like to clarify that we indeed used an experimentally derived pose for one of the ligands (MDMB-FUBINACA) as the cryo-EM structure of MDMB-FUBINACA bound to the protein was available (PDB ID: 6N4B) (Krishna Kumar K. et al., Cell, 2019[9]). However, as the cryo-EM structure had missing loops, we modeled these regions using Rosetta. We apologize for this confusion and have modified our method section to make this point clearer. 

      Regarding HU-210, we acknowledge that a crystallographic or cryo-EM structure for this specific ligand was not available. We selected HU-210 because it is most commonly used example of classical cannabinoid in the literature with extensively studied thermodynamic properties. Importantly, our docking results for HU-210 align closely with previously experimentally determined poses for other classical cannabinoids (Figure S11) and replicate key polar interactions, such as those with S383<sup>7.39</sup>, which are characteristic of this class of compounds. 

      System Preparation (Page 22)

      “Modeling of this membrane proximal region was also performed Remodel protocol of Rosetta loop modeling. A distance constraint is added during this modeling step between C98N−term and C107N−term to create the disulfide bond between the residues. [74,76] 

      As the cryo-EM structure of MDMB-FUBINACA was known, ligand coordinate of MDMB- FUBINACA was added to the modeled PDB structure. The “Ligand Reader & Modeler” module of CHARMM-GUI was used for ligand (e.g., MDMB-Fubinaca) parameterization using CHARMM General Force Field (CGenFF).[77]”

      (5) The last part of using a machine learning-based approach to analyze allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job. 

      We thank the reviewer for the valuable comment. Neural relational inference method, which leverages a VAE (Variational Autoencoder) architecture, attempts to reconstruct the conformation (X) at time t + τ based on the conformation at time t. In doing so, it captures the non-linear dynamic correlations between residues in the VAE latent space. We chose this method because it is not reliant on specific metrics such as distance or angle, making it potentially more robust in predicting allosteric effects between the binding pocket residues and the NPxxY motif.

      In response to the reviewer's suggestion, we have also performed a more traditional allosteric analysis by calculating the mutual information between the binding pocket residues and the NPxxY motif. Mutual information was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. Our results indicate that the mutual information between the binding pocket residues and the NPxxY motif is indeed higher for the NPS binding simulation (Figure S11).

      Method

      Mutual information calculation

      Mutual information was calculated on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues. 

      Results and Discussion (Page 21)

      “To further validate our observations, we estimated allosteric weights between the binding pocket and the NPxxY motif by calculating mutual information between residue movements. Mutual information analysis reaffirms that allosteric weights between these residues are indeed higher for the MDMB-FUBINACA bound ensemble (Figure S11).”

      Mutual Information Estimation (Page 37)

      “Mutual information between dynamics of residue pairs was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. The calculations were done on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.[124]”

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clarity what the distinctive features of two ligand binding mechanisms are. 

      We thank the reviewer for the insightful comment. In the manuscript, we discussed that the overall ligand (un)binding pathways are indeed similar for both ligands. Therefore, they interact with similar residues during the unbinding process. However, we have focused on two key differences in unbinding mechanism between the two ligands:

      (1) MDMB-FUBINACA exhibits two distinct unbinding mechanisms. In one, the linked portion of the ligand exits the receptor first. In the other mechanism, the ligand rotates within the pocket, allowing the tail portion to exit first. By contrast, for HU-210, we observe only a single unbinding mechanism, where the benzopyran ring leads the ligand out of the receptor. We have highlighted these differences in the Figure 6 and 7 and talked about the intermediate states appear along these different unbinding mechanisms. For further clarification of these differences, we have added arrows in the free energy landscapes to highlight these distinct pathways.

      (2) In the bound state, a significant difference is observed in the interaction profiles. HU-210, a classical cannabinoid, forms strong polar interactions with TM7, while MDMB-FUBINACA shows weaker polar interactions with this region.

      We have discussed these differences in the Results and Discussion section (Page 13-18) & conclusion section (Page 23-24).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should choose at least one case where the ligand's crystallographic pose is known and show how TRAM works in comparison to MSM or experimental report. 

      We thank the reviewer for the comment. We have used the experimentally determined cryo-EM pose for one of the ligands (i.e. MDMB-FUBINACA).  We have modified the manuscript to avoid confusion. (Please refer to the response of comment 4 of reviewer 2)

      (2) The authors should consider existing traditional methods that are used to detect allostery and compare their machine-learning-based approach to show its relevance. 

      We appreciate the reviewer’s comment. We have performed the traditional analysis by calculating mutual information between residue dynamics. We have shown that the traditional analysis matches with Machine learning based NRI calculation. (Please refer to the response of comment 5 of reviewer 2)

      (3) Figure 3 doesn't provide a guide on the pathway of ligand. Without a proper arrow, it is difficult to surmise what is the start and end of the pathway. The figures should be improved. 

      We appreciate the reviewer’s suggestion. In response, we have revised Figure 3 to clearly indicate the ligand’s unbinding pathway by adding directional arrows and labeling the bound pose. Additionally, we have updated the figure caption to better clarify the color scheme used in the illustration. 

      (4) The Figure 5 presentation of free energetics has a very similar shape for the two ligands. More clarity is required on how these two ligands are different. 

      We thank the reviewer for the comment. While the overall shapes of the free energy profiles for the two ligands are indeed similar, this is expected as both ligands dissociate from the same pocket and follow a comparable pathway. However, key differences in their unbinding mechanisms arise due to variations in the ligand motion within the pocket. Specifically, the intermediate metastable minima in the free energy landscapes reflect these differences. For instance, in the NPS unbinding free energy landscape, the intermediate metastable state I1 corresponds to a conformation where the NPS ligand maintains a polar interaction with TM7, while the tail of the ligand has shifted away from TM5. This intermediate state is absent in the classical cannabinoid unbinding pathway, where no equivalent conformation appears in the landscape.  

      (6) Page 30: TICA is wrongly expressed as 'Time-independent component analysis'. It is not a time-independent process. Rather it is 'Time structured independent component analysis'. 

      We thank the reviewer for pointing this out. TICA should be expressed as Time-lagged independent component analysis or Time-structure independent component analysis. We have used the first expression and modified the manuscript accordingly.  

      (7) The manuscript's MSM theory part is quite well-known which can be removed and appropriate papers can be cited. 

      We thank the reviewer for the comment. We have removed the theory discussion of MSM and cited relevant papers.

      “Markov State Model

      Markov state model (MSM) is used to estimate the thermodynamics and kinetics from the unbiased simulation.[56,91] MSM characterizes a dynamic process using the transition probability matrix and estimates its relevant thermodynamics and kinetic properties from the eigendecomposition of this matrix. This matrix is usually calculated using either maximum likelihood or Bayesian approach.[56,97] The prevalence of MSM as a post-processing technique for MD simulations was due to its reliance on only local equilibration of MD trajectories to predict the global equilibrium properties.[92,93] Hence, MSM can combine information from distinct short trajectories, which can only attain the local equilibrium.[94–96]  

      The following steps are taken for the practical implementation of the MSM from the MD data. [4,17,98–100]”

      (8) A proper VAMP score-based analysis should be provided to show confidence in MSM's clustering metric and other hyperparameters. 

      We thank the reviewer for the recommendation. VAMP-2 score based analysis had been discussed in the method section.  We estimated VAMP-2 score of MSM built with different cluster number and input TIC dimensions (Figure S15). Model with best VAMP-2 was selected for comparison with TRAM result.

    1. Author response:

      We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.

      Common Concerns (Reviewer 1 & Reviewer 2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their positive and constructive comments on the manuscript. In the revised manuscript we addressed these comments, which we believe have improved the quality of our work.

      In summary:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work, which is to provide an analytical framework for IVM data after segmentation and tracking. Developing open-source segmentation and tracking tools represents a substantial undertaking in its own right, which has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811; https://doi.org/10.7554/eLife.60547; https://doi.org/10.1016/j.media.2022.102358; https://doi.org/10.1038/s41592024-02295-6 - now cited in our revised manuscript). 

      In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, recognizing the need for compatibility with tracking data from various pipelines, we have modified our tool to accept other data formats, such as those generated by open-source Fiji plugins like TrackMate, MTrackJ, ManualTracking (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input). These updates are available in our GitHub repository and are described in the revised manuscript. 

      (2) We appreciate the reviewer #3 suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readmeov-file#feature-selection ). In the revised manuscript, we highlighted this new functionality and provided examples using alternative datasets to demonstrate the application of these features.

      (3)  We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we addressed the concerns raised in the revised version of the manuscript.

      (4) We appreciate reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demos to our GitHub repository (https://github.com/imAIgene-

      Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the revised manuscript, we referenced this addition and present new figures with examples of these demo’s processing different IVM dataset (2D/3D, different tumors and healthy tissues). Additionally, we have provided processed DMG IVM movie samples in an imaging repository.

      (5) Finally, we made some small changes to the manuscript based on the reviewers’ feedback.

      Below we provide a point-by-point response to the reviewers’ comments

      Reviewer #1 (Public review):

      Comment #1: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field. 

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. Several studies (e.g., Diego Ulisse Pizzagalli et al., J Immunol (2022); Aby Joseph et al., eLife (2020); Molina-Moreno et al., Medical Image Analysis (2022); Hidalgo-Cenalmor et al., Nat Methods (2024); Ershov et al., Nat Methods (2022)) have comprehensively addressed these topics, and we now reference them in the revised manuscript to provide readers with relevant background.

      The objective of our manuscript is not to develop a complete segmentation or tracking pipeline but rather to introduce an analytical framework capable of extracting enhanced insights from the data generated by existing tools. This goal arises from our observations of the field: despite significant investment in image processing, researchers often rely on simplistic approaches, such as averaging single parameters across conditions, which can obscure tumor heterogeneity and spatial behavioral dynamics within the tumor microenvironment.

      Our current tool focuses on providing this much-needed analytical capability. For our analysis we used Imaris, a widely utilized software in the intravital microscopy (IVM) community, known for its intuitive 3D visualization and analysis platform despite certain limitations. 

      In our own literature search of recent IVM studies published by leading laboratories in high-impact journals, we found that close to half used Imaris, while the remainder primarily relied on manual workflows with Fiji plugins. Thus, we consider it valuable to offer a pipeline compatible with such commonly used software, given its prevalence in the field.

      However, following the suggestion of the reviewer, and to enhance the tool’s flexibility and compatibility, we have expanded the pipeline to accept data formats generated by open-source Fiji plugins, such as TrackMate, MTrackJ, and ManualTracking. These updates are detailed in the revised manuscript and are implemented in our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ), where we also provide several demos using TrackMate and Imaris processed data. This addition demonstrates our tool's capability to integrate with segmented and tracked datasets from diverse platforms, increasing its applicability to a broader range of researchers using both commercial and open-source pipelines.

      Comment #2: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we have included detailed information in the figure legends and the number of independent mice represented in each figure legend to ensure transparency. Regarding the number

      of cells, we have indicated the total number of processed cells in Figure 2b legend (953 cells). Additionally, we have now included figures (Sup Fig 4c, Sup Fig 5e-g, Fig 5c,e, Sup Fig 6 c,d) for each cluster, where individual dots represent the individual cell tracks with color indicating the position and the shape indicating individual mice.

      Comment #3: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (BEHAV3D_Tumor_Profiler/demo_datasets at main · imAIgeneDream3D/BEHAV3D_Tumor_Profiler · GitHub). In the revised manuscript we have referenced this addition in the Data availability section. Since we included now processing with Fiji as well, we provide 4 demo datasets (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler/tree/main/demo_datasets), one processed with Imaris in 3D; and one with CellPose2.0 and Trackmate in 2D; one processed with µSAM and Trackmate in 3D and one manually processed with MtrackJ in 2D . Moreover, we now provide Imaris-processed DMG IVM movie samples in an open-source repository.

      Comment #4: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and in the revised manuscript we have now provided details in the methods section “Tumor large-scale spatial phenotyping with Cytomap” to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies:

      “To map the assigned regions onto IVM movies, a 3D image of the cluster distribution within the tumor was generated and exported for each sample (Figure Supplement 5a). Next, regions within the IVM movies were visually matched to the corresponding regions identified by the Large-Scale Phenotyping module of Cytomap (Figure 3c). For each mouse, at least one or two representative positions per matched region type were selected, cropped, and analyzed to assess tumor cell behavior, following the previously described cell tracking methodology (Imaris Cell tracking).”

      Moreover, we updated Figure 3 c to further clarify these steps.

      Comment #5: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls. 

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. In the revised version of the manuscript we have revised our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      In discussion: “While our findings suggest that microenvironmental factors may influence tumor cell migration, further studies will be necessary to establish causal relationships. Additional experimental validation, such as macrophage ablation experiments, could help clarify the specific contributions of these factors.”

      Reviewer #1 (Recommendations for the authors): 

      (1) To test the ability of the pipeline to identify relevant patterns of migratory behaviours additional 'control' experiments would be helpful e.g. comparing non-invasive vs invasive tumour cell lines, artificially controlling migratory behaviours of cells such as implanting beads soaked in factors that would attract/repel cells? 

      (2) Does the pipeline work well for a variety of cell types/contexts? e.g. can it identify and cluster more subtle migratory behaviours such as non-tumour cells during tissue development or regeneration conditions? 

      We appreciate the reviewer’s valuable suggestions. In the revised manuscript, we have included additional examples demonstrating the capability of our pipeline to investigate heterogeneous cell behavior across two additional experimental setups:

      (1) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from breast cancer cell lines with varying migratory capacities (DOI: 10.1016/j.yexcr.2019.04.009). In these datasets, our pipeline extends beyond predefined characteristics based solely on speed, enabling the identification of distinct cell populations. Notably, our analysis reveals that the breast cancer lines exhibit different proportions of different migratory behaviors such as Fast, Intermediate, Very slow and Static (Supplementary Fig 1).

      (2) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from healthy breast epithelial cells (DOI: 10.1016/j.celrep.2024.115073), where we identify distinct morhophynamic epithelial cell populations in the terminal end but of the mammary gland that have a distinct distribution among Hormone receptor (HR) + and HR- terminal end but cells.

      (3) To support biological conclusions could the authors show that ablating tumourassociated macrophages or vasculature alters the migratory patterns of nearby tumour cells? 

      We appreciate the reviewer's suggestion regarding the potential effects of ablating tumor-associated macrophages or vasculature on the migratory patterns of nearby tumor cells. While these experiments would functionally validate the observations made by our method, we would like to clarify that the primary focus of our study was on the development and application of computational tools for behavioral analysis and thus we consider that delving deeper in understanding the biology behind our observation is out of the scope of the current study. However, as mentioned previously, we have carefully tempered our conclusions to acknowledge the limitations of our current study. In the revised manuscript, we explicitly highlight that experiments involving the ablation of tumor-associated macrophages or vasculature would be crucial for further understanding the biological relevance of our findings.

      Minor corrections to text: 

      (4) Line 63 - are references formatted correctly?

      Thank you for pointing out this error. We have corrected it in the revised manuscript.

      (5) Lines 161 -162 - 'intravitally imaged' used twice in a sentence.

      Thank you for pointing out the typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Public review):

      Comment#1: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy community due to its user-friendly interface. We conducted a literature review to evaluate this aspect and below we include references from leading laboratories in the IVM field that utilize Imaris. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support 2D and 3D data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we describe the new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module across various IVM datasets, processed in both 2D and 3D with different processing pipelines (Supplementary Fig 1-3). This includes CellPose 2.0 and the novel 'Segment Anything' model, followed by TrackMate tracking, applied to both tumor and healthy IVM data. Moreover we have developed a new web application that integrates morphological and tracking information from Segment Anything segmentation and Trackmate tracking, depicted in Supplementary Fig 3 a (https://morphotrack-merger.streamlit.app/ ). Additionally, we have updated the introduction to better clarify the scope of our study and include references to existing image processing solutions.

      Comment#2: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. 

      To clarify, each imaged position is considered an independent biological replicate (n = 18 from a total of 6 mice). We acknowledge that the description of the statistical methods and the experimental units was not sufficiently clear in the previous version. In our original submission, we used an ANOVA to test whether the proportion of each behavioral cluster differed across the tumor microenvironment regions. Post hoc pairwise comparisons were performed using Tukey’s test, with the results shown in Supplementary Figure 2d (currently Fig 3d). However, we agree with the reviewer that this approach may be misleading when paired with stacked bar plots that lack error bars, as it can obscure individual variability and does not explicitly represent statistical uncertainty.

      In the revised manuscript, we present the data as boxplots with individual data points, where each dot represents an imaged position, and the shape corresponds to a specific mouse. In Figure 3 d the y-axis displays the normalized percentage of each cluster across TME regions, expressed as z-scores. This normalization corrects for inter-mouse variability and facilitates a comparison of the relative distribution of clusters across TME regions, independent of the overall abundance differences between mice. We performed an ANOVA with Tukey's post hoc test for each individual behavioral cluster to assess differences across TME regions. Additionally, for transparency, in Supplementary Figure 5 d we provide the raw percentage values. The legends provide the number of positions and mice included in the analysis. 

      Comment#3:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In this case as we are comparing overall the behavioral clusters features, each individual cell is treated as a unit. In the revised manuscript, we have clarified this point in the figure legend and incorporated plots in Figure 4c and 4e, indicating the mouse and imaging position each data point originates from. This enhances the visualization of reproducibility and variability in our data, demonstrating that the results are consistent across multiple mice and positions and are not driven by a single mouse or imaging position.

      Comment#4: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, although we are unclear about the specific concern being raised. To clarify, in our large-scale phenotyping analysis, each position is assigned to a TME niche based on the CytoMAP analysis and the workflow outlined in Figure 3c. Multiple positions are imaged per mouse. For each position, we measure the proportion of tumor cells exhibiting a specific behavioral phenotype, and these proportions are subsequently used for statistical analysis (Figure 3 d). 

      In contrast, in Supplementary Fig. 5e-g, we treat each cell track as an individual unit, grouping them by their assigned large-scale region. Here, we assess whether differences between regions can be detected using a conventional single-feature analysis—a more traditional approach. However, we find that this method loses important behavioral patterns and distinctions that BEHAV3D-TP captures.

      We hope that this explanation, along with the modifications made to the figures and figure legends, provides greater clarity.  

      Reviewer #3 (Public review):

      Comment #1: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously (please refer to comment #1 to reviewer #1), our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. However to enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In Supplementary Figures 1, 2, and 3, we present IVM data from different sources, processed using three distinct methods: MTrackJ (Supplementary Fig. 1), Cellpose + TrackMate (Supplementary Fig. 2), and µSAM + TrackMate (Supplementary Fig. 3). The latter two represent state-of-the-art deep learning approaches.

      On the other hand, while we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exists, we initially utilized Imaris for its ability to allow manual correction of faulty tracks, ensuring the reliability of our results. This approach, not only widely used (see above) but was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we clarify the scope of our study and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings:

      In introduction: “While significant efforts have been made to develop opensource segmentation and tracking tools for live imaging data, including IVM22–27 fewer tools exist for the unbiased analysis of tumor dynamics. One major barrier is that implementing such analytical methods often requires substantial computational expertise, limiting accessibility for many biomedical researchers conducting IVM experiments. To bridge this gap, we present BEHAV3D Tumor Profiler (BEHAV3D-TP)  by providing a robust, user-friendly tool that allows researchers to extract meaningful insights from dynamic cellular behaviors without requiring advanced programming skills.”

      In the Methods, we describe now describe not only Imaris processing pipeline, but also the µSAM segmentation pipelines and reference to CellPose IVM processing, which are combined with TrackMate for tracking. Additionally, to integrate morphological information from µSAM with tracking data from TrackMate, we developed a web tool to merge the outputs from both processing steps: https://morphotrack-merger.streamlit.app/  

      Comment #2: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.  

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities (Figure 2 Invading and Retreating cells). This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. 

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states (Dekkers, Alieva et al., Nat Biotech, 2023), immune cell types (Crainiciuc et al. (Nature, 2022)), tumor metastatic potential, and drug resistance states (Freckmann et al. (Nat Comm, 2022)). In the revised manuscript, we have referenced relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research:

      In discussion: “While our current study does not provide direct functional validation of the distinct motility clusters identified, existing literature strongly supports the notion that cell dynamics can serve as a proxy for functional states and phenotypic heterogeneity. Prior work, including studies by our group[19,66]  as well as Crainiciuc et al.[35] and Freckmann et al.[20], has demonstrated that variations in cell motility patterns can reflect underlying functional characteristics. Specifically, cell morpho-dynamic features have been shown to correlate with differences in cell type identity, T-cell engagement, metastatic potential, and drug resistance states. This growing body of evidence suggests that tumor cell behavior, as captured by BEHAV3D-TP, may serve as a predictive tool for deciphering functional tumor heterogeneity. Future studies integrating transcriptomic or proteomic profiling of motility-defined subpopulations could further elucidate the biological significance of these behavioral phenotypes.”

      Comment #3: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with any dynamic, morphologic or spatial features present in the data. In the revised manuscript we showcase this new addition with the analyses of two new dataset: 2D IVM data from healthy epithelial breast cells (Supplementary Fig 2) and 3D IVM data from adult gliomas (Supplementary Fig 3). These analyses identified cells with specific morphodynamic characteristics, which exhibited distinct kinetic behaviors or spatial distributions.

      However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the zaxis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively:

      In discussion: “In addition to motility-based classification, features such as tumor cell morphology, proliferation state, and interactions with the tumor microenvironment can further refine tumor phenotyping. BEHAV3D-TP allows for the selection of diverse feature types, supporting datasets that include both dynamic, morphological and spatial parameters. However, we recognize that expanding the feature set may introduce biologically irrelevant noise, particularly in 3D microscopy data where limited z-axis resolution can lead to morphological artifacts. This highlights the potential need in the future to include unbiased feature selection strategies, such as bootstrapping methods67, to ensure the identification of meaningful and biologically relevant parameters. Careful consideration of these aspects is key to maximizing the interpretability and predictive value of analyses performed with BEHAV3D-TP.”

      Comment #4: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We ensure that these points are clearly articulated in the revised manuscript:

      In introduction: “In line with this concept of characterizing cellular dynamic properties for cell classification, we have previously developed an analytical platform termed BEHAV3D 19,21 allowing to perform behavioral phenotyping of engineered T cells targeting cancer. While BEHAV3D was initially developed to analyze T cell migratory behavior under controlled in vitro conditions, we sought to expand its application to investigate tumor cell behaviors in IVM data, where the complexity of the TME presents distinct analytical challenges. This manuscript builds on our foundational work but represents a significant advancement by adapting the pipeline specifically for IVM datasets.”

      Reviewer #3 (Recommendations for the authors): 

      (1) If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D. 

      We thank the reviewer for this recommendation and as stated above we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we detail this new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module using an example dataset of glioma tumors.

      Additionally, we have updated the introduction to better clarify the scope of our study (See comment #1 from Review #3) and include references to existing image processing solutions.

      (2) For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells. 

      As noted in the comment above, the revised manuscript now incorporates references to relevant literature that support our understanding that behavioral differences among cells are driven by their underlying functional differences (See comment #2 from Reviewer #3). Additionally, we would like to point to Figure 2d and Supplementary Fig 4 c that provide evidence of the functional distinctions between the identified clusters.

      (3) The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have added the flexibility to incorporate a wide range of features, including morphological ones, and enabled users to select the specific features they wish to include in their analysis. To illustrate this functionality, we have included 2 example dataset analyzed using this approach (See comment #3 from Reviewer #3). Additionally, as indicated above we emphasize the importance of careful selection and interpretation of features, as improper choices may lead to biologically irrelevant results. This clarification is intended to ensure that users apply the tool thoughtfully and derive meaningful insights.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      The main criticisms levied by both reviewers can be traced down to our use of a long-term video archive to assess for the effects of aging on individual chimpanzees over extended time periods. Specifically, the reviewers raised several points surrounding whether we could exclude ecological variation over years as the explanation of changes with aging, rather than aging itself. Whilst we acknowledge there are limitations to our approach, we provide a comprehensive response to these points highlighting:

      (1) Where ecological variables have been accounted for using controls (including the behaviors of other individuals, or an aging individuals’ behavior at younger ages).

      (2) Where ecological data may be missing, thus a potential limitation to our study, and further data would be beneficial.

      (3) Whether, in light of these limitations, interannual ecological variation offers a likely explanation for the behavioral changes we have identified. We provide an argument that whilst ecological data would be desirable for our study, interannual changes in ecology are unlikely to explain the trends in our data. Additionally, we explain why age-related changes, such as senescence, are more likely to underpin the patterns described in our manuscript.

      Across 1-3, we have made substantial changes to the reporting of our manuscript to ensure that our results are communicated transparently, and conclusions are made with appropriate care. We have also moved all discussion of coula-nut cracking to the supplementary materials, given the points raised by reviewers about the lack of data describing coula-nut cracking in earlier field seasons.

      We hope that these modifications will enhance both the editors’ and reviewers’ assessment of our manuscript, where we have aimed to make careful conclusions that are supported by our available data. Similarly, we have aimed to communicate the importance of our results across fields of research including primatology, evolutionary anthropology, and comparative gerontology, and hope that our research will be of use to further studies within these subfields.

      Reviewer 1 (Recommendations for the authors):

      (1) If possible, include results or a summary of the behaviour of younger adults using stone tools during the same period. It would be helpful to know if they had the same or different pattern to exclude other factors that may influence the tool use (harder nuts in a particular season, diseases, motivation for other foods, etc). 

      We include data for other individuals when analyzing attendance. However, we did not collect comparable long-term efficiency data on younger adult individuals for this study. This is, in part, due to the time constraints imposed by long-term behavior coding. Additionally, only one adult was both present at Bossou throughout the 1999-2016 period, and younger than the threshold for our old-age category across these years (thus, the baseline used to compare with older adults would be just one younger adult, thus would not have been useful for characterizing normal variation of many younger adults over time). However, given the longitudinal data we present, we can use data from the earlier field seasons for each elderly focal individual as a personalized baseline control. Previous studies at Bossou find that across the majority of adulthood, efficiency varies between individuals, but is stable within individuals over time (e.g., Berdugo et al. 2024, cited). We detected similar stability in individuals’ efficiency over the first three field seasons sampled in our analysis, where there was very little intra-individual variation in tool-using efficiency. However, in later years, two individuals (Velu & Yo) began to exhibit relatively large reductions in efficiency.

      These results are unlikely to be explained by ecological variation. If there was a change in ecology underpinning our results, we would expect: [1] changes in ecology to also introduce variation in earlier field seasons, and [2] to influence all individuals in our study similarly. As such, if the changes observed in later field seasons were due to ecological changes, they should have caused a reduced efficiency across individuals, and to a similar degree – we did not observe this result, with large reductions in efficiency were confined to two individuals.   Moreover, for Yo (the individual who exhibited the largest reduction in efficiency) we found some additional evidence that changes in oil-palm-nut cracking efficiency extended beyond the period we sampled, i.e. they were evident even in 2018, reflecting a long-term, directional reduction in efficiency as compared to earlier years of her life. This consistent reduction in tool-using efficiency over multiple years adds further weight to the hypothesis that changes at the level of the individual were causing reduced tool-using efficiency, rather than our results being underpinned by interseasonal variation in ecology.

      Whilst we agree that our study is limited in the extent to which we can analytically assess ecological explanations for changes in nut-cracking efficiency, we believe that hypothetical ecological changes across field seasons do not predict our results. We now raise both sides of this debate in our discussion, where we outline our limitations (see lines 535-593).

      (2) The data from 2011 was scarce, with only one individual having 10 encounters. It would be better to be cautious with this season's results. 

      We appreciate this limitation raised by the reviewer. Velu and Yo were only encountered a few times in 2011; however, both were encountered more frequently in 2016. For 2011, we did not collect oil-palm nut cracking data for either Yo or Velu. Thus, their change in efficiency was detected by models using data from all other years, regardless of the few encounters in 2011. This sparsity of data may still have influenced our metrics for the proportion of time chimpanzees spent engaging in different behaviors when present at the outdoor laboratory in 2011, particularly for Velu, who was one of the two individuals who exhibited a change in behavior in this year (along with Fana, N = 10 for 2011). We have therefore added a line in our results and discussion highlighting the sparsity of data for Velu when estimating these proportions for 2011 (see lines 255-256 & 410).

      Minor corrections 

      (1) The last paragraph of the introduction presents many results, which should be in the results section. 

      We would like to keep this section of the introduction. Our paper investigates the effect of aging on many different aspects of nut cracking, which could become confusing for readers unless laid out clearly. We believe that having a short summary early on in the paper assists readers with following the methods and arguments presented within our paper.

      (2) The first section (Sampled data) of the results contains much information that belongs in the methods section. 

      We appreciate that there is some overlap between our methods and results section. However as the results section comes before the methods in our manuscript, we wanted to ensure that there is suitable information in our results that allow our results to be interpreted clearly by readers, and that the methods used to generate these results are transparently communicated. For these reasons, we will leave this information in the results, as we believe it increases our paper’s readability. 

      Reviewer 2 (Public review):

      One of the main limitations of this study is the small sample size. There are only 5 of the old-aged individuals, which is not enough to draw any inferences about aging for chimpanzees more generally. Howard-Spink and colleagues also study data from only five of the 17 years of recorded data at Bossou. The selection of this subset of data requires clarification: why were these intervals chosen, why this number of data points, and how do we know that it provides a representative picture of the age-related changes of the full 17 years? 

      We note that our sample size is limited to 5 individuals. This is an inevitable constraint of analyzing aging longitudinally in long-lived species, as only few individuals will live to old age. We argue that 17 years is a long enough period of study, as in the initially sampled field season (1999) focal individuals are reaching a mature age of adulthood (39-44 years) and begin to age progressively up to ages that are typically considered to be on the extreme side for chimpanzees’ lifespans in the wild (56-61 years). We raise in our methods that whilst it is difficult to determine precisely when chimpanzees become ‘old aged’, previous studies use the age of around 40 years, as from this age survivorship begins to decrease more rapidly (see Wood et al., Science 2023). Indeed, one focal individual (Tua) disappeared during the period of our study (presumed dead), and one other individual died in 2017 (Velu), the year after our final sampled field season. As of 2025, two other focal females have since died, and only one focal individual was still alive at Bossou (Jire, the individual exhibiting the least evidence for senescence over our study period). These observations suggest that we successfully captured data from chimpanzees during the oldest ages of their lives for most individuals in the community. Moreover, the period of 1999-2016 contains the majority of data available within the Bossou Archive, with years before and after this window containing comparably less data. This information is included within our results and methods (see sections 2.1 and 4.1).

      For our earliest field season (1999), it is unlikely that senescence had already had an effect on stone-tool use, as we measured efficiency to be high across all efficiency metrics for all individuals. For example, in 1999, the median number of hammer strikes performed by focal chimpanzees ranged from 2-4 strikes, and this was comparable to the efficiency reported across all adults observed in previous studies at Bossou (Biro et al. 2003, Anim. Cog.). This finding suggests that senescence effects had not yet taken place, allowing us to evaluate whether aging affects efficiency over subsequent field seasons. This point is now included in the manuscript on lines 449-452. 

      We sampled at 4-to-5-year intervals to balance the time-intensive nature of fine-scale behavior coding against the need to sample data across the extended 17-year time window available in our study. We limited the final year to 2016 as, in following years, data were collected using different sampling protocols (though, see limited data from 2018 in the supplementary materials). We aimed to keep the intervals between years as consistent as possible (approx. 4 years); however, for some years data were not collected at Bossou, due to disease outbreaks in the region. In these instances, we selected the closest field season where suitable data were available for study (always +/- 1 year). We have provided further clarification surrounding our sampling regime in the methods (see amendments in section 4.1)

      With measuring and interpreting the 'efficiency' of behaviors, there are in-built assumptions about the goals of the agents and how we can define efficiency. First, it may be that efficiency is not an intentional goal for nut-cracking at all, but rather, e.g., productivity as far as the number of uncrushed kernels (cf. Putt 2015). Second, what is 'efficient' for the human observer might not be efficient for the chimpanzee who is performing the behavior. More instances of tool-switching may be considered inefficient, but it might also be a valid strategy for extracting more from the nuts, etc. Understanding the goals of chimpanzees may be a difficult proposition, but these are uncertainties that must be kept in mind when interpreting and discussing 'decline' or any change in technological behaviors over time.

      We agree that knowing precisely how chimpanzees perceive their own efficiency during tool use is unlikely to be available through observation alone. However, under optimal foraging theory, it is reasonable to assume that animals aim to economize foraging behaviors such that they maximize their rate of energy intake. Moreover, a wealth of studies demonstrate that adult chimpanzees acquire and refine tool-using skill efficiency throughout their lives. For example, during nut cracking, adults often select tools with specific properties that aid efficient nut cracking (Braun et al. 2025, J. Hum. Evol.; Carvalho et al. 2008, J. Hum. Evol.; Sirianni et al. 2015, Anim. Behav.); perform nut cracking using more streamlined combinations of actions than less experienced individuals (Howard-Spink et al. 2024, Peer J; Inoue-Nakamura & Matsuzawa 1997, J. Comp. Psychol.), and as a result end up cracking nuts using fewer hammer strikes, indicating a higher level of skill (Biro et al. 2003, Anim. Cogn.; Boesch et al. 2019, Sci. Rep.). Ultimately, these factors suggest that across adulthood, experienced chimpanzees perform nut cracking with a level of efficiency which exceeds novice individuals, including across the whole behavioral sequence for tool use, even if they are not aware or intending to do so. Previous studies at Bossou have also highlighted that there are stable inter-individual differences in efficiency of individuals over time (Berdugo et al. 2024, Nat. Hum. Behav.). This pattern of findings allows us to ask whether this acquired level of skill is stable across the oldest years of an individual’s life, or whether some individuals experience decreased efficiency with age. In addition, our selection of efficiency metrics is in keeping with a wealth of studies which examine the efficiency of stone-tool using in apes, thus, we argue that this is not problematic for our study.

      As we stated in our initial responses to reviewers, it is unlikely that tool switching is a valid strategy for tool use, as it is so rarely performed by proficient adult nut crackers (including earlier in life for our focal individuals). Nevertheless, we did not find a significant change in tool switching for oil-palm nut cracking, and this behavioral change was only observed when Yo was cracking coula nuts. As we have now moved discussion of coula nut cracking to the supplementary materials (and tempered discussion of coula nut cracking to emphasize the need for more data) this behavioral variable does not influence our reported results. 

      In our discussion, we also highlight how seemingly less efficient actions may reflect a valid strategy for nut cracking. E.g. a greater number of tool strikes may reflect a strategy of compensation for progressive tool wear. This would still reflect a reduced efficiency (e.g. in terms of the rate at which kernels can be consumed), but may perhaps borne for the necessity to accommodate for changes in an individuals’ physical affordances with aging. Thus, we do take the Reviewer’s point into account, but by using an alternative, more likely, example given the available data. We have now emphasized this point in lines 521-527.

      We have also clarified these matters by adding more information into our methods (see lines 798-802 and 828-829), highlighting that we take a perspective on efficiency that reflects the speed of nut processing and kernel consumption, and the number of different behavioral elements required to do so. Our phrasing now explicitly avoids using language that assumes that individuals’ have some perception of their own efficiency during tool use.

      For the study of the physiological impact of senescence of tool use (i.e., on strength and coordination), the study would benefit from the inclusion of variables like grip type and (approximate) stone size (Neufuss et al., 2016). The size and shape of stones for nut-cracking have been shown to influence the efficacy and 'efficiency' of tool use (i.e., the same metrics of 'efficiency' implemented by Howard-Spink et al. in the current study), meaning raw material properties are a potential confound that the authors have not evaluated. 

      We did not collect this data as part of our study. Whilst grip type could be a useful variable to measure for future studies, it is not necessary to demonstrate senescence per se. However, we agree that this could be a fruitful avenue to understand changes in behavior at greater granularity, and have added this as a recommendation for further study. We also now provide a discussion on stone dimensions and materials as part of our limitations (see lines 581-589 for both points).

      Similarly, inter- and intraspecific variation in the properties of nuts being processed is another confound (Falótico et al., 2022; Proffitt et al., 2022;). If oil palm nuts were varying year-to-year, for example, this would theoretically have an effect on the behavioral forms and strategies employed by the chimpanzees, and thus, any metric of efficiency being collected and analyzed. Further, it is perplexing that the authors analyze only one year where the coula nuts were provided at the test site, but these were provided during multiple field seasons. It would be more useful to compare data from a similar number of field seasons with both species if we are to study age-related changes in nut processing over time (one season of coula nut-cracking certainly does not achieve this). 

      We have moved all discussion of coula nuts to the supplementary materials so as to avoid any confusion with oil-palm nuts (see comments from Reviewer 2, and our response). Nut hardness may influence the difficulty with which nuts are cracked, with one of the most likely factors influencing nut hardness being its age: young nuts are relatively harder to crack, whereas older nuts, which are often worm-eaten or can be empty, crack more easily, yet are not worth cracking (Sakura & Matsuzawa, 1991; Ethology). We largely controlled for this in our study, as the nuts provided at outdoor laboratories were inspected to ensure that the majority of them were of suitable maturity for cracking, and we now clarify this control in our methods (see lines 678-680) and when discussing our study limitations (see lines 551-558). In these sections, we also highlight a previous study at Bossou that shows chimpanzees select nuts which can be readily cracked, based on their age (Sakura & Matsuzawa, 1991; Ethology).

      We acknowledge that we are limited in the extent to which we can control for interannual variation in ecology with our available data. However, we highlight why interannual variability is unlikely to fully explain our results (see lines 551-580 and response to comments from Reviewer 1). We also highlight in our limitations section that future studies should (where possible) aim to collect more ecological data to account for possible confounds more rigorously.

      Both individual personality (especially neophilia versus neophobia; e.g., Forss & Willems, 2022) and motivation factors (Tennie & Call, 2023) are further confounds that can contribute to a more valid interpretation of the patterns found. To draw any conclusions about age-related changes in diet and food preferences, we would need to have data on the overall food intake/preferences of the individuals and the food availability in the home range. The authors refer briefly to this limitation, but the implications for the interpretation of the data are not sufficiently underlined (e.g., for the relevance of age-related decline in stone tool-use ability for individual survival). 

      In our discussion, we highlight that multiple aging factors may influence apes’  dietary preferences and motivations to attend experimental (and perhaps also naturally-occurring) nut cracking sites (see lines 397-443 and 542-550). We do not believe that neophobia is a likely driver underlying our results, given that the outdoor laboratory has been used to collect data for many decades, including over a decade prior to the first field season in which data were sampled for our study (now highlighted in lines 692-694). In addition, previous studies at Bossou have determined that the outdoor laboratory is visited with comparable frequency to naturallyoccurring nut cracking sites, which makes any form of novelty bias unlikely (this information is now included in our methods, see lines 397-400, and also 687-689). 

      We agree that further information is required about foraging behaviours across the home range to understand changes in attendance at the outdoor laboratory, and have now provided more clarity on this within the limitations section of our discussion 542-550. In our discussion of individual survivability, we state clearly that we cannot make a conclusion about how changes in tool use influence survival with the available data, and assert that this would require data across the home range (see lines 627-638). We agree that future research is needed to assess whether changes in tool use would influence survivability, and also suggest that it may not be survival-relevant; instead changes in tool use with aging may simply be a litmus test for detecting more generalized senescence.

      Generally speaking, there is a lack of consideration for temporal variation in ecological factors. As a control for these, Howard-Spink and colleagues have examined behavioral data for younger individuals from Bossou in the same years, to ostensibly show that patterns in older adults are different from patterns in younger adults, which is fair given the available data. Nonetheless, they seem to focus mostly on the start and end points and not patterns that occur in between. For example, there is a curious drop in attendance rate for all individuals in the 2008 season, the implications of which are not discussed by the authors. 

      As the reviewer points out, when examining the attendance rates of older individuals over sampled field seasons, we used the attendance rates of younger individuals as a control. However, we do not run this analysis using start and end points only. Attendance rates were included in our model across the full range of sample field seasons. However, as the key result here is an interaction term between age cohort (old) and the field season (scaled about the mean), we supplement this significant statistical result with a digestible comparison of attendance rates between the first and last field season, to give a general sense of effect size. We have clarified that all data were used in our model (see line 229, and also the legend for Table 2), and in this section we also provide all key model outputs and signpost where the full model output can be found in the supplementary materials.

      As far as attendance, Howard-Spink and colleagues also discuss how this might be explained by changes in social standing in later life (i.e., chimpanzees move to the fringes of the social network and become less likely to visit gathering sites). This is not senescence in the sense of physiological and cognitive decline with older age. Instead, the reduced attendance due to changes in social standing seems rather to exacerbate signs of aging rather than be an indicator of it itself. The authors also mention a flu-like epidemic that caused the death of 5 individuals; the subsequent population decline and related changes in demography also warrant more discussion and characterization in the manuscript. 

      We have adapted this part of the discussion to make it clear that social aging is not necessarily equivalent to physiological and cognitive aging. We have also clarified in this section the changes in demography at Bossou during our study, which may have further impacted social behaviors (see lines 423-443). 

      Understandably, some of these issues cannot be evaluated or corrected with the presented dataset. Nonetheless, these undermine how certain and/or deterministic their conclusions can really be considered. Howard-Spink et al. have not strongly 'demonstrated' the validity of relationships between the variables of the study. If anything, their cursory observations provide us with methods to apply and hypotheses to test in future studies. It is likely that with higher-resolution datasets, the individual variability in age-related decline in tool-use abilities will be replicated. For now, this can be considered a starting point, which will hopefully inspire future attempts to research these questions. 

      We thank the reviewer for their comments. We have adapted our manuscript to highlight that we agree that it serves a starting point for answering these valuable questions; however, we do feel that we can contribute meaningful evidence that it is likely aging effects underlying the findings in our data (see responses above). We agree with the reviewer that further study is needed to understand these questions in more detail, and have tried to ensure that our conclusions are suitably tempered, and the recommendations for research are heavily encouraged to build on our findings.  

      Falótico, T., Valença, T., Verderane, M. & Fogaça, M. D. Stone tools differences across three capuchin monkey populations: food's physical properties, ecology, and culture. Sci. Rep. 12, 14365 (2022). 

      This has now been cited.

      Forss, S. & Willems, E. The curious case of great ape curiosity and how it is shaped by sociality. Ethology 128, 552-563 (2022). 

      We do not cite this – see above.

      Neufuss, J., Humle, T., Cremaschi, A. & Kivell, T. L. Nut-cracking behaviour in wild-born, rehabilitated bonobos (Pan paniscus): a comprehensive study of hand-preference, hand grips and efficiency. Am. J. Primatol. 79, e22589 (2016). 

      This has now been cited.

      Proffitt, T., Reeves, J. S., Pacome, S. S. & Luncz, L. V. Identifying functional and regional differences in chimpanzee stone tool technology. R. Soc. Open Sci. 9, 220826 (2022). 

      This has now been cited.

      Putt, S. S. The origins of stone tool reduction and the transition to knapping: An experimental approach. J. Archaeol. Sci.: Rep. 2, 51-60 (2015). 

      We do not cite this, as we instead cite studies which highlight chimpanzees’ ability to become more efficient in tool use with repeated practice (see above). 

      Tennie, C. & Call, J. Unmotivated subjects cannot provide interpretable data and tasks with sensitive learning periods require appropriately aged subjects: A Commentary on Koops et al. (2022) "Field experiments find no evidence that chimpanzee nut cracking can be independently innovated". ABC 10, 89-94 (2023). 

      We do not cite this – see above

      Reviewer #2 (Recommendations for the authors):

      Minor Comments: 

      (1) Line 494: Citation #53 is listed twice. 

      This has been amended.

      (2) Line 501: The term 'culturally-dependent' as used here is, at best, controversial, and at worst, misapplied. I would recommend replacing it with simply the term 'cultural'. 

      This has been changed to ‘cultural’.

      Major Comments: 

      For the Introduction, in the paragraph starting on Line 91, and the Discussion, starting on Line 369, I would recommend some simple re-structuring of the argumentation. As many in the Public Review, the changes in social standing according to age are not necessarily a case of senescence in the very sense of physiological or cognitive changes of the individual. This seems to have had an effect on attendance rates, which then could have been a driver of behavioral changes and even cognitive decline as ostensibly measured by the other variables. The social impact of aging should be mentioned in the Introduction (it is not currently) and the social and physiological/cognitive effects of aging should be separated in the Discussion. You can then discuss more clearly how the former via other behavioral changes can accelerate the latter (or not). 

      We take the point raised about social aging. Integrating information about social aging into the introduction was challenging without disrupting the flow of the paper; however, we have included these valuable points in the discussion (see lines 423-443). We now structure this section to clearly distinguish social aging, and discuss how, in tandem with changes in demography at Bossou, it may have influenced rates of attendance to the outdoor laboratory over the years. We do not go into detail about how social aging may interact with physiological or cognitive effects of aging, as we cannot support this with the available data, however we highlight at the end of this paragraph how all of these possible factors require further investigation.

      For the present study, it will either be impossible or impractical to gather data on the yearly ecological conditions, contextualized dietary preferences, individual personalities, etc., so I would not ask that you do so. It is important, however, to temper some of the claims being made in the manuscript about what you have 'determined' about the nature of senescence in chimpanzees and to be more transparent about the limitations and potential confounds when interpreting the data. To avoid repetition, the key points can be found in the Public Review under 'Weaknesses'. 

      We appreciate the reviewer’s understanding of the limitations of our study. Some of these factors – such as individual personalities and dietary preferences – are addressed somewhat by our use of long-term data at the level of the individual, particularly in the analyses of efficiency, where we model individuals’ behaviors compared to those in earlier years offers an individuallybespoke control. However, there are other ecological variables of possible importance that we cannot evaluate. We now address several of these points raised by reviewers in the discussion, to ensure transparency of reporting (see limitations section of our discussion, and results to the comments provided by Reviewer 1, and our responses to points raised in the Public Review). We have also tempered some of the phrasing surrounding our conclusions, where we say that this is the first evidence that aging can impact chimpanzee tool use, we also highlight the need for an assortment of further studies. 

      Finally, the integration of the coula nut-cracking data is not well-executed as it stands. I would recommend that they collect and analyze equivalent behavioral data from the other years where coula nuts were provided. By examining only one season of coula nut-cracking, we cannot contextualize the data to past seasons; there is no sense in comparing one season of coula nut-cracking (i.e., in a sense of efficiency) to roughly contemporary seasons of palm-nut cracking due to, as you describe, differences in physical properties of the nuts. If you are not able to collect the additional data and carry out the requisite analysis, then I would recommend that the coula nut-related sections be removed from the manuscript, so that it does not detract from the logical flow of arguments and distract from the other data, which is more logically-attuned to your research questions. 

      We have removed this from the main manuscript. We have decided to include the information surrounding coula nut cracking in the supplementary materials, as this information is still relevant to the findings of our study, and may interest some readers. However, we have phrased this information to make it clear that further data is needed to compare coula nut cracking across years.

      These criticisms do not subtract from the (potential) value or importance of the work for the field. This is, of course, an important contribution to an understudied topic. As such, I would gladly advocate for the manuscript, assuming the authors would reflect on the listed caveats and make changes in response to the 'Major Comments'. 

      We thank the reviewer for their comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public reviews):

      (1) Commander-Independent Role of COMMD3: While the authors provided evidence to support the Commander-independent role of COMMD3-such as the absence of other Commander subunits in the CRISPR screen and not decreased COMMD3 levels in other subunit-KO cells- direct evidence is lacking. The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question.

      The Reviewer raised an excellent point. We fully agree with the Reviewer that multiple lines of evidence are needed to support the novel Commander-independent function of COMMD3.

      Comparative genetic analyses in Figures 4 and 5 indicate that COMMD3 regulates endosomal retrieval independently of the Commander complex. In Figure 8 of the revised manuscript, we show that point mutations introduced into the COMMD3:ARF1 interface impair this Commander-independent function. Moreover, Figure 6 demonstrates that ARF1 upregulation fully rescues the KO phenotype of COMMD3. In addition, Figure S2 further supports that COMMD3 levels, but not those of other Commander subunits, correspond to its Commander-independent function in endosomal trafficking. We have also revised the Discussion section to elaborate on the implications of these findings. We appreciate the Reviewer’s advice.

      (2) Role of ARF1 in Cargo Selection: The Commander-independent function of COMMD3 appears cargo-dependent and relies on ARF1's role in cargo selection. The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR.

      The Reviewer correctly pointed out that KO/KD of ARF1 may provide further insights into the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations in the NTD that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). As these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this recycling pathway. We note that the discovery of a novel trafficking pathway inevitably opens many research directions. One such direction is to systematically identify cargoes that rely on COMMD3 but not the Commander complex for endosomal retrieval.

      (3) Impact on TfR Stability: Figure 7D suggests that TfR protein levels are reduced in COMMD3-KO cells, potentially due to degradation caused by disrupted recycling. This raises the question of whether the observed reduction in cell surface TfR is due to impaired endosomal recycling or decreased total protein levels. The authors should quantify the ratio of cell surface protein to total protein for TfR, GLUT-SPR, and ITGA6 in COMMD3-KO cells.

      Based on the Reviewer's suggestion, we quantified both the total levels and the surface-tototal ratio of TfR, as shown in Figure S1 of the revised manuscript. These new data further support the conclusion that defects in TfR retrieval lead to its lysosomal degradation. The GLUT-SPR data presented in the main figures represent the surface-to-total ratio of the GLUT-SPR reporter. We thank the Reviewer for the important suggestion.

      Reviewer #1 (Recommendations for the authors):

      (1) Commander-Independent Role of COMMD3: The mutation that specifically disrupts the COMMD3-ARF1 interaction could serve as a valuable tool to directly address this question. The authors should evaluate whether the full-length mutant of COMMD3 can rescue decreased levels of CCDC93 and VPS35L, as well as cell surface ITGA6, TfR, and GLUT4 inCOMMD3-KO cells.

      This is an excellent point. In our mechanistic experiments, we focused on the NTD of COMMD3 because this domain mediates its Commander-independent function and is not involved in forming the Commander holo-complex. This approach allowed us to draw unambiguous conclusions. Nevertheless, we anticipate that full-length COMMD3 carrying these point mutations would also be defective in regulating Commander-independent cargo.

      (2) Role of ARF1 in Cargo Selection: The authors should investigate whether KO/KD of ARF1 reduces cell surface levels of ITGA6 and TfR. Was ARF1 identified in the initial CRISPR screen? If so, this should be explicitly noted. Alternatively, does ARF1 overexpression rescue ITGA6 levels in COMMD3-KO cells? Furthermore, does ARF1 overexpression rescue TfR levels in COMMD3 and CCDC93 double-KO cells?

      Reinto the Commander-independent function of COMMD3. However, since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would disrupt multiple trafficking routes, making the data difficult to interpret. Instead, we focused on point mutations that specifically disrupt ARF1 binding without affecting the function of the Commander complex (Fig. 8). Since these mutations impair the Commander-independent function of COMMD3, our data strongly support a direct role for ARF1 in this novel recycling pathway. Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, a key research direction we are currently pursuing is systematically determining how surface protein levels are affected by COMMD3 KO and ARF1 overexpression using surface proteomics.

      (3) Inconsistency in COMMD3 Rescue Levels (Figure 5A): Figure 5A shows comparable or higher levels of COMMD3 in rescued cells than in CCDC93-KO and VPS35L-KO cells. However, COMMD3 rescue did not increase cell surface TfR as much as in CCDC93-KO and VPS35L-KO cells. This inconsistency should be discussed or validated.

      To address the Reviewer’s inquiry, we quantified COMMD3 expression levels in these cell lines using multiple independent experiments. The new data are presented in Figure S2 of the revised manuscript. These expanded datasets allowed us to more accurately determine the relationship between COMMD3 expression and our genetic data. Since the Commander complex remains intact in the COMMD3 rescue cells, a significant portion of COMMD3 proteins are expected to be incorporated into the Commander complex, which does not regulate TfR recycling. In contrast, because the Commander complex is disrupted in Ccdc93 and Vps35l KO cells, all COMMD3 proteins are available to regulate TfR recycling in a Commander-independent manner. These findings are fully consistent with the similar surface TfR levels observed in Ccdc93/Vps35l KO cells and COMMD3 overexpressing cells. We thank the Reviewer for this excellent suggestion.

      (4) Significance of NTD in COMMD3 Function: The conclusion that "the NTD of COMMD3 mediates its Commander-independent function and interacts with ARF1" (Page 12) is not fully supported without a side-by-side comparison of NTD, CTD, and FL COMMD3 in the same experiment (e.g., Figures 6B and 6G). Additional data is needed to strengthen this claim.

      We conducted the experiment suggested by the Reviewer and included the data in Figure S3. Our results indicate that the COMMD3 CTD cannot mediate the Commander-independent function of COMMD3 in endosomal retrieval. We appreciate the Reviewer’s suggestion.

      (5) ARF1 Stabilization Experiments: To substantiate the claim that COMMD3 binds and stabilizes the GTP-form of ARF1, the authors should include a comparative experiment showing GTP-form, GDPform, and wild-type ARF1 (e.g., Figures 6G and 7C).

      We fully agree with the Reviewer that it would be important to compare how the ARF1:COMMD3 interaction is influenced by the nucleotide-binding state. However, trapping ARF1 in its GDP-bound state remains unfeasible, and nucleotide-free small GTPases are inherently unstable. In addition, WT ARF1 likely exists as a mixture of GTP- and GDP-bound forms, further complicating the analysis. To address the Reviewer’s comment, we used AlphaFold3 predictions. Interestingly, we found that the ipTM score of GTP-ARF1:COMMD3 is significantly higher than that of GDP-ARF1:COMMD3 or apo-ARF1:COMMD3, supporting our conclusion that COMMD3 recognizes and stabilizes the active form of ARF1.

      (6) Validation of NTD Mutation (Figure 8): Co-immunoprecipitation or cellular co-localization experiments should be performed to confirm that the NTD mutation disrupts the interaction between COMMD3 and ARF1, as depicted in Figure 8.

      This is an important question, and the best approach to address it would be to measure the binding affinity of the WT and mutant proteins using ITC or SPR. However, this is currently unfeasible, as we have not yet obtained pure recombinant COMMD3 and GTP-ARF1 proteins. Co-IP, by nature, is a crude assay that often fails to detect changes in binding affinity. A previous study on other proteins showed that mutations in protein-binding interfaces strongly reduced binding affinity as measured by SPR, but these changes would have been missed by co-IP assays (PMID: 25500532). In agreement with this limitation, our co-IP experiments did not yield conclusive results. Instead, we focused on structure-guided genetic experiments, which unequivocally demonstrated the effects of targeted mutations on the Commander-independent function of COMMD3. 

      Reviewer #2 (Public review):

      (1) All existing data suggest that COMMD3 is a subunit of the Commander complex. Is there any evidence that COMMD3 can exist as a monomer?

      The Reviewer raised an intriguing point. Indeed, COMMD proteins, including COMMD3, can exist outside the Commander holo-complex and form homo- or hetero-oligomers, as monomeric COMMD proteins are likely unstable. These observations align well with the Commander-independent function identified in this study. We have revised the Discussion section of the manuscript to further elaborate on this point and thank the Reviewer for the suggestion.

      (2) In Figure 9, the author emphasizes COMMD3-dependent cargo and Commander-dependent cargo. Can the authors speculate what distinguishes these two types of cargo? Do they contain sequence-specific motifs?

      This is another important question. Our data clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holocomplex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point and thank the Reviewer for this important suggestion.

      (3) What could be the possible mechanism underlying the observation that the knockout of COMMD3 results in larger early endosomes? How is the disruption of cargo retrieval related to the increase in endosome size?

      The endosomal retrieval process is critical for recycling membrane proteins and lipids back to the plasma membrane or the trans-Golgi network. When this process is disrupted, cargo that should be recycled accumulates within endosomes, leading to their enlargement. For example, defects in retromer function can cause endosomal swelling due to cargo accumulation (PMID: 33380435). We added this citation to the revised manuscript and thank the Reviewer for the advice. 

      Reviewer 3 (Recommendations for the authors):

      (1) Figure 4: How do the authors define Commander-dependent vs. Commander-independent cargos?

      In Figure 4, the surface expression of ITGA6 is reduced to approximately 0.75 across all knockouts. However, there is a similar level of reduction for GLUT4-SPR in the commd5 knockout and for LAMP1 in the commd5 and commd1 knockouts. Are GLUT4-SPR and LAMP1 Commander-dependent or Commander-independent cargos? Additionally, how does COMMD3 specifically identify/distinguish these cargos?

      This is an excellent point. Our data suggest that TfR is a COMMD3-dependent but Commander-independent cargo, whereas ITGA6 is a Commander-dependent cargo that does not involve COMMD3-specific functions. The other two cargoes we examined—GLUT-SPR and LAMP1—primarily rely on COMMD3, with the Commander complex playing a minor role. Together, these observations clearly demonstrate that COMMD3 has a Commander-independent function in addition to its canonical role within the Commander holo-complex. Since cargo proteins typically possess multiple sorting signals that operate at different stages of the exocytic and endocytic pathways, identifying COMMD3-dependent sorting signals remains a challenge. ARF4 has been shown to specifically recognize the VXPX motif (PMID: 15728366), suggesting that ARF1 may similarly bind cytosolic sorting signals, with COMMD3 stabilizing this interaction. A key future direction is to systematically identify COMMD3-dependent cargo proteins and elucidate the mechanisms underlying their endosomal sorting. We have revised the Discussion section of the manuscript to explicitly address this point. We thank the Reviewer for this important suggestion.

      (2) There is an increase in the surface expression of GLUT4-SPR in the commd1 knockout. Is this increase significant? The figure suggests a significant increase, but the text states it remains unchanged. Clarification is needed.

      We found that surface levels of GLUT-SPR were slightly increased in Commd1 KO cells, in stark contrast to the strong reduction observed in Commd3 KO cells (Fig. 4B). This finding is consistent with our conclusion that COMMD3 has a distinct role from other Commander subunits. We have revised the Results section to more clearly describe these data and thank the Reviewer for the advice.

      (3) Figure 5A: To support the claim that COMMD3 is upregulated in the vps35l KO/Ccdc93 KO, the authors should quantify COMMD3 expression. Also, why is there a Vps35l band present in the Vps35l knockout cells?

      Based on the Reviewer’s suggestion, we quantified the total levels of COMMD3 and included these new data in Figure S2. In this study, gene deletion was achieved through the simultaneous introduction of two independent gRNAs. Based on our previous experience, this strategy typically results in the complete loss of gene expression. We posit that the residual band observed in Vps35l KO cells originates from background signals, such as nonspecific staining by the antibody.

      (4) Figure 7: It is intriguing that COMMD3 stabilizes Arf1-GTP and can compensate for COMMD3 in knockout cells. However, is this stabilization specific to TfR cargo only? The authors should test additional Commander-dependent and Commander-independent cargos to clarify this point.

      Based on our genetic data, we anticipate that all COMMD3-dependent cargoes will be similarly rescued in ARF1-overexpressing cells. In line with the Reviewer's comment, an important direction we are pursuing is the use of surface proteomics to systematically determine how surface protein levels are affected by COMMD3 KO and ARF1 overexpression.

      (5) Is Arf1 interaction specific to COMMD3? The authors should investigate the effects of Arf1 knockout on COMMD3 expression and test its role in regulating Commander-dependent and Commander-independent cargos.

      The Reviewer raised an excellent point. Since ARF1 is involved in cargo sorting at both the endosome and the trans-Golgi network, its KO would interfere with multiple trafficking routes and the data would be difficult to interpret. Thus, in this work, we focused on the function and mechanism of the COMMD3:ARF1 complex on the endosome. Based on the suggestion of the Reviewer, we used AlphaFold3 to predict ARF1 binding to COMMD proteins. Interestingly, the complex with the highest predicted ipTM score is COMMD3:ARF1, while other COMMD proteins have much lower predicted binding scores. These results are consistent with the results of our unbiased CRISPR screens and targeted gene KO, and further support the conclusion that the COMMD3:ARF1 binding is specific and physiologically important in endosomal trafficking.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.

      Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.

      Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.

      Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.

      We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.

      Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:

      (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)

      (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions

      (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R<sup>2</sup> = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R<sup>2</sup> = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.

      Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R<sup>2</sup> = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R<sup>2</sup> = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R<sup>2</sup> and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.

      These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).

      We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.

      Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?

      We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.

      For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K<sup>+</sup> channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.

      For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.

      It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.

      With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.

      To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel Na<sub>V</sub> 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel Na<sub>V</sub>1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a Na<sub>V</sub>1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate Na<sub>V</sub>1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.

      (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.

      We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB

      5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.

      (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).

      We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”

      (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.

      We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.

      Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.

      We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, Na<sub>V</sub>1.5, in Figure S14.

      Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.

      While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.

      Minor concerns:

      (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.

      The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”

      (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.

      We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how

      AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.

      (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.

      The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC<sub>50</sub> values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.

      As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.

      (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.

      We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”

      (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).

      a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.

      We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).

      Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”

      b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K<sup>+</sup> channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.

      We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.

      The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.

      Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.

      (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.

      We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”

      Reviewer #3 (Recommendations for the authors):

      The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).

      We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.

      Astemizole (Figure S13a):

      - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.

      - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.

      - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.

      - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).

      - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.

      E-4031 (Figure S13b):

      - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.

      - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.

      - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.

      - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).

      - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.

      In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).

      The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).

      In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.

      Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.

      We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K<sup>+</sup> conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.

      Page 8, the significance of 750 and 500 mV in terms of physiological role?

      We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K<sup>+</sup> conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K<sup>+</sup> ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K<sup>+</sup> channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K<sup>+</sup> channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.

      The abstract could be edited a bit to more clearly state the novel findings in this study.

      We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is K<sub>V</sub>11.1 (hERG), comprising the primary cardiac repolarizing current, I<sub>kr</sub>. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”

      Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.

      Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.

      Additionally, we revised the Discussion section to improve focus and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development.

      Weaknesses:

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed.

      We appreciate the reviewer's recognition of the impact of our study. We will address the concerns about data analysis and the statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review):

      Summary:

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42.

      Strengths:

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance.

      Weaknesses:

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly

      We appreciate the reviewer's recognition of the impact of our study. Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A. We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review):

      Summary:

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.

      Strengths:

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.

      Weaknesses:

      (1) A better characterization of the nature of the small EV population is missing:

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations.

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a colloidal Coomassie-stained gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent four bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor:

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy.

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate. Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy and a faster frame rate to observe all the MVB fusion events and get an accurate calculation of this number. The timing of the acquisition was based on the typical timing of filopodia formation, which is slow relative to MVB fusion. Thus, with the current dataset, we could miss secretion events taking place between the 10 second time intervals. Therefore, to address this question, we would need to acquire a new dataset with a much more rapid frame acquisition (multiple frames per second rather than one frame every ten seconds). Regardless, for the secretion events that we visualized with the current dataset, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript. A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging. This is stated in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful.

      Our data show that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A. Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013). We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence. This would possibly involve more proteomics analysis to identify candidate exosomal cargoes involved in this process.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 µm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area. To demonstrate that this quantification gives similar results, we have now plotted the filopodia per cell area data from Fig 2 as filopodia per cell and placed these new plots in Supp Fig 2.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats.

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions. We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were unable to detect THSD7A using the same (reducing) conditions for the mouse melanoma B16F1 samples but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns.

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands. If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant. Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A:

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8.

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet. In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images. For the cellular distribution of endoglin, we agree that this is an important future direction to understand how endoglin regulates THSD7A trafficking. We have added the lack of these data to the “Limitations” section at the end of the manuscript.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells?

      The images for Figure 7E were taken with high resolution on a confocal microscope. Insets for Figure 7E were digitally zoomed so that readers could see the tiny structures. Zoom 1 in Figure 7E shows areas of extracellular deposition, whereas Zoom 2 shows THSD7A colocalization with CD63 in MVE. In the extracellular areas (Zoom 1), we observe small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more secretion of THSD7A in small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet, and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.

      Quantification of internal THSD7A localization is much more straightforward in this experimental regime. Indeed, in Figure 7F, we quantitated internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

      With regard to whether the extracellular deposits are migrasomes, we have no reason to believe that they would be migrasomes. The preponderance of our evidence points to exosomes as carrying THSD7A and inducing filopodia. Furthermore, CD63 is an exosome marker (Sung et al., Nat Comm, 2020) and does not induce migrasomes, unlike many other tetraspanins (Huang et al., Nat Cell Bio, 2019).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors need to clarify the figure labeling and description and conclusions would be better to be drawn based on the findings. Some figures need to more clear e.g. Figure 1E needs to have information on what are green and red fluorescent proteins. Do all figures in 1A have the same scale bar or different? Figure 3A lacks a scale bar. In Figure 3, the GFP signal is in yellow, does it represent a merge or is it just the GFP alone? Figure 6D is missing a D. Figure 4D needs to be better explained. Additionally, both figures 8B and 8C since represent a model based on all the findings of the study would be better to stand alone as a separate figure from figure 8A.

      The figure legend for figure 1E notes that green corresponds to GFP-Rab27b and the red corresponds to mCherry filler. In addition, the labels are marked to the right of the figure. For Figure 1A, we have now indicated in the legend that all scale bars = 10 µm. In figure 3, neurons were co-transfected with GFP or GFP-Rab27b. Thus, the yellow signal in these images is the merge of the mCherry filler with either GFP (expression throughout the neuron body and dendrites) or GFP-Rab27b (punctate colocalization). We have added a scale bar to Fig 3A. Figure 6D has been corrected, with a “D” label added. Figure 4D shows representative images of cells with filopodia under the various conditions, including add-back of control or endoglin-KD EVs. We have clarified the conditions in the figure legend for 4D. For Figure 8, we have now split it into 2 figures: one with data (Fig 8) and one with the model (Fig 9).

      Reviewer #2 (Recommendations for the authors):

      For the most part, this story is strong and well-presented. The findings are interesting and will significantly advance our understanding of how EVs affect various processes such as cancer metastasis. However, the Cdc42 work is not great. They only indirectly implicate Cdc42 with a somewhat iffy inhibitor (ML141) and a constitutively active form transfected into cells. Both approaches have drawbacks such as off-target effects in the case of the inhibitor and possible cross-talk to other GTPases in the case of the active mutant. The activation of Cdc42 should be demonstrated by an activity assay. Several commercial kits are available. Inhibition of Cdc42 should be tested by knockdown in addition to the inhibitor.

      We appreciate the reviewer’s recognition of our work. To address the limitations of our study, particularly the Cdc42 mechanistic work, we have now added a “Limitations of the study” section at the end of the text. Here, we address our experimental limitations and future directions.

      Reviewer #3 (Recommendations for the authors):

      (1) Since the purified small EVs contain canonical exosomal markers and originate from MVEs, the authors should consider a more consistent use of the term "exosome" to avoid confusion.

      We acknowledge that the usage of both “exosomes” and “small extracellular vesicles” can seem confusing to many readers. Typically in the EV field, we use the term “exosome” when we can reliably determine that the EVs originate from the endocytic pathway. Thus, we use this term when we have specifically perturbed this pathway by targeting Hrs or Rab27. We use the term “small extracellular vesicles” or SEVs when referring to a purified heterogeneous population of SEVs from unknown or a variety of origins. Thus, when referring to vesicles isolated from the conditioned media, we call them SEVs because we cannot determine their origin. Clarification of this terminology has been added to the introduction of the paper.

      (2) 1st results section - expressing mCherry as a "filler" is confusing, clarify that this is meant to identify cellular background.

      This has now been clarified in the paper.

      (3) Figure 3 - Although Rab27a and Rab27b play a role in exosome secretion, Rab27b does not have redundant functions with Rab27a in every cellular context. The authors should mention the specific roles of Rab27a and Rab27b in promoting MVE fusion with the PM and in regulating the anterograde movement of MVEs to the PM, respectively (Ostrowski et al. 2010, Citation 52 in the ms). Although Rab27a is not highly expressed in neurons, it is not currently clear whether Rab27b has a redundant function with Rab27a or whether there is another unknown factor that plays this role. As neurons also do not express endoglin, the mechanisms that mediate how EVs regulate filopodia formation in these cells are most probably different than in cancer cells. This should be highlighted in the discussion.

      We have now added a couple of clarifying sentences about the roles of Rab27a and Rab27b to the results section, including the Ostrowski reference and another reference suggesting possible redundancy of Rab27a and Rab27b. With regard to endoglin not being expressed by neurons, that is one reason why we carried out the proteomics with control and endoglin-KD EVs to find a universal cargo that would directly induce filopodia formation. Indeed, THSD7A seems to be such a universal cargo, expressed in both cancer cell and neuron EVs and inducing filopodia in both cell types. This point, along with the requirement for regulation of THSD7A by other molecules in neurons, is discussed in the results and discussion sections.

      (4) As the authors note, the mechanistic link between endoglin-sorted, exosomal THSD7A and Cdc42-mediated filopodia formation remains unclear. While the findings on Cdc-42 are clear, they are not surprising. What is the role of mDia/ENA/VASP or BAR proteins in this? The authors should also consider an assay to determine whether exosomal THSD7A binds to the PM to cause the signaling or if the cargo is first internalized before performing its function. Since this process is both autocrine and paracrine, the authors could co-culture THSD7A-mScarlet cells with vector control cells and observe how THSD7A-mScarlet is localized in the non-expressing cells.

      As other reviewers also noted, the Cdc42 mechanistic data at the end of the paper has clear limitations that are now addressed within the manuscript in a “Limitations of the Study” section. Here we discuss our experimental troubleshooting and approach to assaying Cdc42 involvement in this process. We acknowledge there are many rigorous experiments that could be pursued in the future to strengthen our mechanism and proposed model.

      We also agree that elucidating how THSD7A specifically interacts with target cells would be very informative and insightful. This would be most effectively assayed using a cell line that is stably expressing THSD7A-mScarlet and could be a future direction of this project. However, it is out of the scope of this current publication.

    1. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.

      Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.

      Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Thank you for your thoughtful review and for highlighting the importance of our approach.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.

      Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.

      We are grateful for your high evaluation of our work.

      The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:

      (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.

      We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.

      (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.

      We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.

      (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.

      We provided the PAE plot in the revised Figure S1C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.

      We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.

      (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.

      ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.

      (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.

      Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.

      We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.

      (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.

      Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.

      Reviewer #2 (Recommendations for the authors):

      I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?

      We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.

      As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?

      Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.

      We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.

      Reviewer #3 (Recommendations for the authors):

      Here are some additional minor suggestions:

      (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.

      We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.

      (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.

      We have added a proposed diagram as Figure 1A.

      (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.

      We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).

      (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".

      We have revised the labeling in Figure 1D.

      (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.

      We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.

      (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".

      We have changed “wide” to “extensive” in the revised manuscript.

      (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".

      We simply used "tandem" in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In a previous work Prut and colleagues had shown that during reaching, high frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report they extend their previous work by addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joint. More interestingly, the experiment revealed evidence for decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      None

      Reviewer #1 (Recommendations for the authors):

      The authors have answered my questions adequately and I have no further comments.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center out reaching movements and has been published from this laboratory in several preceding studies. I found the takehome-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, data were clear, convincing and novel. The key strengths are differentiating acute from subacute (within session but not immediate) kinematic consequences of cerebellar block.

      Reviewer #2 (Recommendations for the authors):

      I think the manuscript is good as is. That said, it would have been nice to see more of the behavioral outcomes in Figure 5 (e.g. decomposition and trajectory variability) analyzed longitudinally like the velocity measurements in Fig. 4. This would clearly strengthen the insight into acute and compensatory components of cerebellar motor deficits.

      The two behavioral measures of motor noise used in our study are movement decomposition and trajectory variability (Figure 5). Since trajectory variability is measured across trials we could not analyze this measure longitudinally as a function of trial number. However, following the reviewer’s advice, we examined movement

      decomposition for successive trials in control vs. cerebellar block for movements to targets 2-4 similar to the analysis of  hand velocity in figure 4. We found no interaction effect between trial sequence x cerebellar block on movement decomposition. This result is consistent with our conclusion that noisy joint activation occurs independently of adaptive slowing of multi-joint movements. We have updated our main text (lines 293-299) and supplementary information (supplementary figure S5 and supplementary table S8) to include this result.  

      Reviewer #3 (Public review):

      Summary:

      In their revised manuscript, Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement related phenotypes in patients with cerebellar lesion or injury, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they find a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption in the monkey, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      In this revised version of the manuscript, the authors have provided additional analyses and clarification that address several of the comments from the original submission.

      Remaining comments:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on joint torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas. While this experimental design was not implemented here, it seems like a good opportunity for future work using these approaches.

      We agree with the reviewer that examining the effect of the cerebellar block on immediate post-block washout trials in future studies will be insightful.    

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. While it is still not entirely clear why disruption of movement during the adaptive phase is not seen for inward targets, despite the fact that many of the inward movements also exhibit large interaction torques, the authors do raise potential explanations in the Discussion.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study. In the revised manuscript, the authors do provide additional anatomical and evolutionary context and discuss potential limitations in the selectivity of HFS in the Materials and Methods. However, I feel that at least a brief mention of these caveats in the Introduction, where it is stated, "we then reversibly blocked cerebellar output to the motor cortex", would benefit the reader.

      Following the advice of the reviewer, we have now revised the introduction section of our manuscript in the following way (lines 61-67):

      “…We then reversibly disrupted cerebellar communication with other neural structures using high-frequency stimulation (HFS) of the superior cerebellar peduncle, assessing the impact of this perturbation on subsequent movements. Although our approach primarily affects cerebellar output to the motor cortex, it also disrupts fibers carrying input signals (e.g., spinocerebellar) and pathways to various subcortical targets (e.g., cerebellorubrospinal). Thus, our manipulation broadly interferes with cerebellar communication…”

      Reviewer #3 (Recommendations for the authors):

      Typo on line 102; "subs-sessions"

      We have corrected this typographical error in our revised manuscript (line 106).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

      We thank the reviewer for their thoughtful review. To address comments, we have added clarifying statements and discussion points around the extent of performance gains, our choice of benchmarking metrics, and the “standards” in the field for significance. We expanded our analysis to highlight how to use qFit ligand in “discovery” mode, which is aimed at supporting individual modeling efforts. As we now write in the discussion:

      “It is advisable to employ qFit-ligand selectively, focusing on cases with a moderate correlation between your input model and the experimental data, strong visual density in the binding pocket, high map resolution, or when your single-conformer ligand model is strained.”

      Additionally, we note in the discussion:

      “qFit-ligand primarily serves as a “thought partner” for manual modeling. Modelers still must resolve many ambiguities, including initial ligand placement, to fully take advantage of qFit capabilities. In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.

      We thank Reviewer #2 for their comments on the role of conformational flexibility and how our tool addresses the complexity involved in modeling alternative conformations. We agree that there are limitations at low resolution, limiting the application of our algorithm. That is the case with all structural biology tools. Automatically finding alternative conformations of ligands in high-resolution structures is an enhancement to the toolbox of ligand fitting. Expanding the algorithm to work with fragment screening data is important in this realm, as almost all of this data fits in the high-resolution range where qFit-ligand works best.

      The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.

      We agree that the changes are small, partially because the baseline (manually modeled ligands) is very high. To provide additional evidence, we added evaluations using EDIAm, which is a more sensitive metric. In Figure 2 (page 10), representing the development dataset, we see more improvements above 0.1. With this being said, it is unclear what constitutes a ‘substantial’ improvement for either of these metrics, especially considering alternative conformations may only change the coordinates of a subset of ligands, just slightly improving the fit to density.

      We agree that looking across the PDB on strain would provide valuable insight. To explore this, we looked to see how qFit-ligand could improve the fitting of deposited ligands with high strain (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, Page 15). While only a subset of these structures had alternative conformers placed (24.6%), we observed that in this subset, the ligands often improved the RSCC and strain. This figure also demonstrates that while RSCC may not change much numerically, the alternative conformers explain previously unexplained density with lower energy conformers than what is currently deposited.

      To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.

      See above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A - Specific comments:

      (1) It appears necessary to provide qFit-ligand with an initial model with the ligand already placed. This is not clear from the start of the introduction on page 3. It appears that ligand position is only weakly adjusted fairly late in the process, in step F of Figure 1. It seems, therefore, that the accuracy of initial placement is rather critical (see the example discussed on page 21). At the same time, in my experience, ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. It would be helpful for the authors to comment on the sensitivity to initial ligand placement, either in the discussion or, better yet, in the form of an analysis in which the starting model position is randomly perturbed.

      In our revised version, we have modified the introduction to clarify the necessity of including an initial ligand model (page 4).

      “The qFit-ligand algorithm takes as input a crystal or cryo-EM structure of an initial protein-ligand complex with a single conformer ligand in PDBx/mmCIF format, a density map or structure factors (encoded by a ccp4 formatted map or an MTZ), and a SMILES string for the ligand.”

      We also describe our sampling algorithm more clearly (see: Biasing Conformer Generation, page 6). Steps A-E generate many conformations (using RDKit), which are then selected/fit into experimental density (using quadratic programming). To help with additional shifting issues in the input ligand, after the first selection, we do additional rotation/translation of the generated conformers that are kept. We then do another round of fitting to the density (quadratic programming followed by mixed integer quadratic programming).

      Given this sampling, we have not elected to do an additional computational experiment to test the “radius of convergence” or dependence on initial conditions. However, we outline the fundamental procedure here so that someone can build on the work and test the idea:

      - Create single conformer models as we currently do

      - randomly perturb the coordinates of the ligand by 0.1-0.3Å

      - refine to convergence, creating a series of “perturbed, modified true positives” for each dataset

      - Run qFit ligand

      - Evaluate the variability in the resulting multi-conformer models

      (2) Top of page 6 ("Biasing Conformer Generation"): the authors say "as we only want to generate ligands that physically fit within the protein binding pocket, we bias conformation generation towards structures more likely to fit well within the receptor's binding site". Apart from the odd redundancy of this sentence, I am confused: at the stage that seems to be referred to here (A-C in Figure 1) is the fit to the electron density already taken into account, or does this only happen later (after step E)?

      Thank you for pointing this out. We have edited the statement to clarify it:

      “To guide the conformation generation from the Chem.rdDistGeom based on the ligand type and protein pocket, we developed a suite of specialized sampling functions to bias the conformational search towards structures more likely to fit well into the receptor’s binding site.”

      We do not consider the electron density during conformer generation (only selection from the generated conformers). The sampling is additionally biased by the type of ligand and the size of the binding pocket.

      (3) qFit-ligand appears to be quite slow. Are there prospects for speedup? Can the code take advantage of GPUs or multi-CPU environments?

      We agree with this. We have made some algorithmic improvements, most notably removing duplicate conformers based on root mean squared distance. This, along with parallelization, decreased the average runtime from ~19 minutes to ~8 minutes (see additional details: qFit-ligand runtime, page 8). We do not currently take advantage of GPU specific code.

      (4) Section: Detection of experimental true positive multi-conformer ligands:

      a) Why are carbohydrate ligands excluded? This seems like an important class of ligands that one would like qFit to be able to treat! Which brings me to a related question: can covalently attached groups (e.g., glycosylation sites!) be modeled using qFit-ligand, or is qFit-ligand restricted to non-covalently bound groups?

      Currently, qFit-ligand does not support covalently bound ligands, but this is an area of interest we are hoping to expand into. In the revised version, we added the non-covalently attached carbohydrates back into the true positive dataset. In Figure 4 (page 14), we show that qFit-ligand is able to improve fit to the experimental density in around 80% of structures, while also often reducing torsion strain (see additional details: qFit-ligand applied to unbiased dataset of experimental true positives, page 14).

      b) "as well as 758 cases where the ligand model's deposited alternate conformations (altlocs) were not bound in the same chain and residue number" - I do not understand what this means, or why it leads to the exclusion of so many structures. Likewise, a number of additional exclusions are described in Figure S3. Some more background on why these all happened would be helpful. Are you just left with the "easy" cases?

      Sometimes modelers will list the multiple conformations of a bound ligand as a separate residue within the PDB file, rather than as a single multiconformer model. For example, rather than writing a multiconformer LIG bound at A, 201 with altlocs ‘A’ and ‘B’, a modeler might write this instead as LIG, A, 201 and LIG A, 301. We initially excluded these kinds of structures. However, we agree that this choice resulted in the removal of many potentially valid true positives. We have since updated our data processing pipeline to include these cases, and they are examined in the updated manuscript.

      c) I do not follow the argument made at the end of this section (last two paragraphs on page 9): "when using a single average conformation to describe density from multiple conformations, the true low-energy states may be ignored". I get that, but the conformations in the "modified true positives" dataset derive directly from models in which two conformations were modeled, so this cannot be the explanation for why qFit-ligand models result in somewhat lower average strain. It would seem that the paper could be served by providing examples where single conformations were modeled in deposited structures, but qFit detects multiple conformations.

      We agree with this comment that the strain obtained from the modified true positives is likely higher than the deposited models. However, the modified structure is refined with a single conformation, and therefore changed from the deposited “A” conformation. Thus, the reduced strain observed in our qFit-ligand models relative to the modified true positives is not unexpected.

      To expand our dataset, we also looked at deposited structures with high strain, all of which were modeled as single conformers. Here, we saw a decrease in strain when alternative conformers were placed (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, page 15). Further, we provide an example from the XGen macrocycle dataset where a ligand initially modeled as a single conformer exhibited relatively high strain. After qFit‐ligand modeled a second conformation, the overall strain was reduced (Figure 6C, page 19; Figure 6—figure supplement 1C, page 59).

      (5) Section: qFit-ligand applied to an unbiased dataset of experimental true positives Bottom of page 14: The paragraph starting with "qFit-ligand shows particular strength in scenarios with strong evidence..." is enigmatic: there's no illustration (unless it directly relates to the findings in Figure 4, in which case this should be more explicit). Since this points out when the reader will and will not benefit from using qFit-ligand, it should be clear what the authors are talking about.

      This claim considers all the evidence presented in the manuscript, not necessarily one particular aspect of it. We advise using qFit-ligand when there is a moderate correlation between the input model and the experimental data, strong visual density in the binding pocket, high map resolution, and/or when your single conformer ligand model is strained. We have made all of these points clearer in the updated manuscript.

      B  - Section: qFit-ligand can automatically detect and model multiple conformations of macrocycles:

      This is an exciting extension of qFit-ligand, but some aspects of the analysis strike me as worrisome. Of the initial dataset of 150 structures, fewer than half make it all the way through analysis. It's hard to believe that this is a fully representative subset. Why, for example, could 29 structures not be refined against the deposited structure factors? Why does strain calculation (in RDKit?) fail on 30 ligands? What about the other 18 cases--why did these fail (in PHENIX?).

      We agree that this is a striking number of failures, however, we note that they are not specific shortcomings of qFit-ligand (in fact, most are because standard structural biology and/or cheminformatics software fail on many PDB depositions). Therefore, these failures reflect broader limitations in standard bioinformatics and refinement restraint files when handling macrocycles. The strain calculator we used was not built for macrocycles, and after consulting with many experts in the field, the consensus was that no method works well with macrocycles. We discuss these issues in additional detail in the discussion (page 27):

      “Additionally, our algorithm’s placement within the larger refinement and ligand modeling ecosystem highlighted other areas that need improvement. We note that macrocycles, due to their complicated and interconnected degrees of freedom, suffer acutely from the refinement issues, as demonstrated by the failure of approximately one-third of datasets in our standard preparation or post-refinement pipelines due to ligand parameterization issues. Many of these stemmed from problematic ligand restraint files, highlighting the difficulty of encoding the geometric constraints of macrocycles using standard restraint libraries. Improved force-field or restraints for macrocycles are desperately needed to improve their modeling.”

      C  - Minor issues:

      (1) "Fragment-soaked event maps" - this is a semantically strange section title!

      We have updated the section title in our revised manuscript. The new title is ‘qFit-ligand recovers heterogeneity in fragment-soaked event maps’.

      (2) Too many digits! All over the manuscript, percentages are displayed with 0.01% precision, while these mostly refer to datasets with ~150 structures. Shifting just one structure from one category to another changes these percentages by nearly 1%.

      We have updated the sig figs in our revised manuscript.

      (3) The authors are keen to classify decreases in RSCC as significant only when these changes exceed 0.1, but do not apply the same standard for increases. For instance, in Figure 4B if we were to classify improvements as significant if ΔRSCC > 0.1, there would be fewer significant improvements than decreases in performance (although it is visually clear that for most datasets things get better. Similarly, in Figure 5A if we were to classify improvements as significant if ΔRSCC > 0.1, qFit-ligand would only yield significant improvements for two out of 73 cases-not a lot).

      We agree with the reviewer that there needs to be more consistency in our analysis of improvements/deteriorations. However, we note that operationally, when the decreases in model quality are observed, the modeler would simply reject the new model in favor of the input model. We have added to the discussion:

      “In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      There is generally no consensus in the field as to what might indicate a ‘significant’ change in RSCC, and any threshold we choose would be arbitrary. We note that in our manuscript, we had previously characterized a decrease in RSCC to be ‘significant’ if it exceeded 0.1. However, as there is no real scientific justification for this cutoff, or any cutoff, we moved away from this framing in the revised manuscript. Therefore, we just classify if we improve RSCC. For example, on page 9:

      “qFit-ligand modeled an alternative conformation in 72.5% (n=98) of structures. Compared with the modified true positive models, 83.7% (n=113) of qFit-ligand models have a better RSCC and 77.0% (n=104) structures saw an improvement in EDIAm, representing an improved fit to experimental data in the vast majority of structures.”

      In addition, we have conducted additional experiments using more sensitive metrics (EDIAm) to further illustrate qFit-ligand’s performance.

      (4) Small peptides are not discussed as a class of ligands, although these are quite common.

      Canonical peptides can be modeled with standard qFit. Non-canonical peptides present failure modes similar to the macrocycles discussed above, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons we have not included an analysis outside of the macrocycle section. We have noted this caveat in the discussion:

      “We note that even linear non-canonical peptides present similar failure modes to macrocycles, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons, we did not include analysis on small peptide ligands; however, canonical peptides can be modeled with standard qFit [8].”

      (5) Top of page 10: "while refinement improves": what kind of refinement does this refer to?

      This refers to refinement with Phenix. We have updated this language to reflect this (page 8). “We refer to these altered structures as our ‘modified true positives’, which we use as input to qFit-ligand, and subsequent refinement using Phenix.”

      (6) Bottom of page 11: "they often did" -> "it often did"

      We have made this change in the revised version.

      (7) Top of page 14: RMSDs and B factors do have units.

      We have added the units in our revision.

      (8) Top of page 24. In the generation of a composite omit map, why are new Rfree flags being generated? Did I misunderstand that?

      r_free_flags.generate=True only creates R-free flags if they are not present in the input file as is the case for many (especially older) PDB depositions.

      (9) Bottom of page 27: how large is the mask? Presumably when alt confs of the ligand are possible, it would be helpful for the mask to cover those?

      We agree that this mask should be updated. In our revision, we define the mask around the coordinates of the full qFit-ligand ensemble. The same mask is used to calculate the RSCC of the input (single conformer) model versus the qFit-ligand model.

      (10) Middle of page 29: "These structure factors are then used to compute synthetic electron density maps." - It is not clear whether the following three sentences are an explanation of the details of that statement or rather things that are done afterwards.

      We clarify this in the manuscript (page 36).

      “These structure factors are then used to compute synthetic electron density maps. To each of these maps, we generate and add random Gaussian noise values scaled proportionally to the resolution. This scaling reflects the escalation of experimental noise as resolution deteriorates, a common occurrence in real-life crystallographic data.”

      (11) Chemical synthesis: I am not qualified to assess this and am surprised to see some much detail here rather than in some other manuscript. Are the corresponding structures deposited anywhere?

      All of the structures we discuss in this manuscript are deposited in the PDB and listed in Supplementary Table 5.

      Reviewer #2 (Recommendations for the authors):

      The data should consistently present the number of structures that exhibit improvements or deterioration in particular metrics, like RSCC and strain, using a cutoff that should be significant. For instance, stating that "85.93% (n=116) of structures having a better RSCC in the qFit-ligand models compared to the modified true positive models" without clarifying the magnitude of improvement (e.g., a marginal increase of 0.01 in RSCC) lacks meaningful context. The figures should clearly indicate the specific cutoff values used for each metric. The accompanying text should provide a detailed explanation for the selection of these cutoff values, justifying their significance in the context of the study.

      Currently, there is no established consensus within the field on what constitutes a 'significant' improvement in RSCC or strain values. As such, we chose not to impose an arbitrary cutoff and just look at which structures improve RSCC. We also removed all language stating significance, as there isn’t a good standard in the field to assess significance. This is especially important as only improvements would be considered in an active modeling project. In cases where qFit ligand degrades the RSCC (or strain) to a large extent, the modeler would simply revert to the input model.

      In the first section of Results: "First, for all ligands, we perform an unconstrained search function allowing the generated conformers to only be constrained from the bounds matrix (Figure 1A). This is particularly advantageous for small ligands that benefit from less restriction to fully explore their conformational space. We then perform a fixed terminal atoms search function (Figure 1B)." It is unclear whether a fixed terminal atom search was conducted for each conformer generated in the initial step to further explore the conformational space. This aspect should be clarified to provide a more comprehensive understanding of the methodology.

      Each independent conformer generation function (A-E) is initialized with only the input ligand model and runs in parallel with the other functions. These functions do not build on each other, but rather perturb the input molecule independently of one another. In our updated manuscript, we have clarified the methodology (page 6).

      “First, in all cases, we perform an unconstrained search function (Figure 1A), a fixed terminal atoms search function (Figure 1B), and a blob search function (Figure 1C).”

      Phrase: "We randomly sampled 150 structures and, after manual inspection of the fit of alternative conformations, chose 135 crystal structures as a development set for improving qFit-ligand." The authors should explain why they filtered 10% of the structures.

      To develop qFit-ligand, we wanted to use a very high-quality dataset. We needed to know with some degree of certainty that if qFit-ligand failed to produce an alternate conformation (or generated conformations low in RSCC or high in strain), the failure was due to an algorithmic limitation rather than poor-quality input data. Therefore, after selection based on numerical metrics, we manually examined each ligand in Coot to observe if we believed the alternative conformers fit well into the density.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)  

      Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section: 

      “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”

      (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.

      Thank you, we have now added a description in both Fig. 2 and 3:

      “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”

      (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)

      Thank you - fixed.

    1. Author response:

      We would like to thank the editors and reviewers for their time and their helpful feedback. We largely agree with the reviewer recommendations and comments, which we will address for the next Version on Record of this manuscript. We plan to address reviewer comments in the following ways.

      Reviewers requested a more comprehensive analysis of our RNA-seq experiment comparing vehicle treatment to enoxolone treatment over time. We will improve our analysis by providing clear, accessible, and organized tables defining differentially expressed genes at each time point, gene set lists that comprise our gene ontology analysis, and the lists of shared differentially expressed genes from enoxolone treatment and HNF4⍺ knockout. While some of this data was provided in the supplementary files, we recognize that it should be more accessible for the reader. Furthermore, as suggested by the Reviewer, we will enhance our transcriptomic analysis by utilizing bioinformatic tools such as Enrichr.

      The Reviewers noted that we identified a number of lipoprotein-lowering compounds through our drug screen, but limited the impact of our manuscript by focusing on enoxolone, a known inhibitor of HNF4⍺ and modulator of lipid metabolism. While we understand with the sentiment that other novel compounds would be interesting to study, we aimed to demonstrate proof of concept in this manuscript. We view the characterization of novel compounds as beyond the direct scope of this manuscript. We did not perform LipoGlo imaging and electrophoresis experiments on each drug because these experiments are low-throughput given the number of drugs and doses we examined. In light of the Reviewer’s comments, we will add some additional characterizations of our validated hits with LipoGlo imaging and electrophoresis studies.

      The reviewers also identified a number of typos in text and figures that will be addressed in the next Version on Record. We believe that the recommended changes will strengthen our manuscript and broaden its appeal. We are grateful for the opportunity to improve our work based on the reviewers’ valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      We thank the reviewer for highlighting the strength in our manuscript  as quote: “Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.”

      Weakness :

      (1) The findings rely heavily on a single AMKL cell line, with no validation in patient-derived samples to confirm clinical relevance or even another type of leukemia line. Adding the discussion of PRMT1's role in other leukemia types will increase the impact of this work.

      We mentioned in the introduction that PRMT1 is known to be the driver for leukemia with diverse types of mutations. In a related paper published in Cell Reports (Su et al. 2021), we demonstrated that PRMT1 is upregulated in MDS myeloid dysplasia syndrome patient samples and that the inhibition of PRMT1 promotes megakaryocytic differentiation of a few MDS samples. AMKL is very rare. Via Children’s Oncology group consortium, we have obtained five AMKL samples with Down’s syndrome and AMKL with RBM15-MKL1 translocation out of 32 samples in the bank over the last 20 years. Interestingly, these patient samples also contain trisomy 19. As PRMT1 is localized on chromosome 19, we speculate that PRMT1 is the significant driver for AMKL leukemia, although we have very limited genetic evidence. However, these human frozen samples derived from peripheral blood cannot be grown in a cell culture system. Although we did not perform metabolic analysis for other AMKL cell lines, we did validate in our unpublished studies that PRMT1 drives down CPT1A expression in normal bone marrow cells and platelets in mice and in human leukemia cell line called MEG-01, which can be differentiated into megakaryocytes upon PMA (phorbol 12-myristate 13-acetate) treatment. Therefore, we expect that the PRMT1-mediated metabolic reprogramming we described here should apply to other types of hematological malignancies.

      (2) The observed heterogeneity in Prmt1 expression is noted but not further investigated, leaving gaps in understanding its broader implications.

      The expression level of PRMT1 is heterogeneous within leukemia cell populations, making it intriguing to study. We can sort the cells based on high versus low PRMT1 expression using a fluorescent dye called E84. However, we have not conducted transcriptome analysis on these two populations, mainly due to resource constraints. Theoretically, the E84 high-expression population may transiently utilize glucose more efficiently, as these cells do not ectopically express PRMT1. Therefore, when nutrient levels decline, these cells might switch to the low PRMT1 expression population. It will be interesting to see whether endogenous leukemia cells transiently expressing high levels of PRMT1 take advantage of their efficient usage of glucose and thus adapt to the niche environment successfully, as we observed in the Figure 1. I agree that this would be an interesting direction to pursue in the future.

      (3) Some figures and figure legends didn't include important details or had not matching information.

      We would like to thank the reviewer for pointing out these mistakes. Now we have corrected.

      (4) Some wording is not accurate, such as line 80 "the elevated level of PRMT1 maintains the leukemic stem cells", the study is using the cell line, not leukemia stem cells.

      Leukemic stem cells are often referred to as cells that can initiate leukemia when transplanted into recipient mice, a concept first proposed by John Dick. In this study, we found that even the 6133 cell line displays heterogeneity in terms of PRMT1 expression levels. We identified a subgroup of 6133 cells as leukemia stem cells due to their ability to initiate leukemia.

      (5) In the disease model, histopathology of blood, spleen, and BM should be shown.

      We did not conduct histopathology analysis. 6133 cells associated histopathology has been published in Mercher et al JCI 2009 and a recent preprint by Diane Krause’s group.

      (6) Can MS023 treatment reverse the metabolic changes in PRMT1 overexpression AMKL cells?

      Yes, We demonstrated in figure 4 in the seahorse assays that prmt1 inhibitor can increase the oxygen consumption.

      It would be helpful to provide a summary graph at the end of the manuscript.

      Yes, we now provide a graphic abstract.

      Reviewer #2 (Public review):

      We would like to thank the reviewer for finding the manuscript novel and important.

      Weaknesses:

      (1) The manuscript lacks detailed molecular mechanisms underlying PRMT1 overexpression, particularly its role in enhancing survival and metabolic reprogramming via upregulated glycolysis and diminished oxidative phosphorylation (OxPhos). The findings primarily report phenomena without exploring the reasons behind these changes.

      In the introduction, we highlighted that numerous studies have demonstrated how PMT1 directly interacts with several key enzymes involved in glycolysis. These studies provide a mechanism for the observed upregulation of PMT1 in leukemia. Additionally, our previous research published in eLife 2015 {Zhang, 2015 #5031} demonstrated that PRMT1 methylates the RNA-binding protein RBM15, which can bind to the 3' UTR of mRNAs encoding various metabolic enzymes. Therefore, we propose that PMT1 may also regulate metabolism indirectly through the RBM15 protein.

      (2) The article shows that PRMT1 overexpression leads to augmented glycolysis and low reliance on the OxPhos. However, the manuscript also shows that PMRT1 overexpression leads to increased mitochondrial number and mitochondrial DNA content and has an elevated NADPH/NAD+ ratio. Further, these overexpressing cells have the ability to better survive on alternative energy sources in the absence of glucose compared to low PMRT1-expressing parental cells. Surprisingly, the seashores assay in PRMT1 overexpressing cells showed no further enhancement in the ECAR after adding mitochondrial decoupler FCCP, indicating the truncated mitochondrial energetics. These results are contradicting and need a more detailed explanation in the discussion.

      We have explained the metabolic changes in more detail now. Increasing mitochondria number is not equivalent to increasing fatty acid oxidation and oxygen consumption, as the mitochondria have many other functions. PRMT1 only downregulates CPT1A, which is a rate-limiting step for long-chain fatty acid oxidation. The data suggest that PRMT1 promotes the biogenesis of mitochondria maybe via PGC1alpha as published by Stallcup’s group. The seahorse assays were performed in the high concentration of glucose instead of alternative carbon sources.  FCCP treatment under high glucose conditions did not increase the ECR and OCR, which is normal for leukemia cells as shown in other people’s publications {Sriskanthadevan, 2015 #3944}{Kreitz, 2019 #2133}. PRMT1 could dampen the activities of TCA cycle and the electron transportation chain as the proteomic data from our unpublished data and published data {Fong, 2019 #1185} suggested. The elevated NADPH/NAD+ ratio is another indication that glycolysis and anabolism are enhanced by PRMT1.

      (3) How was disease penetrance established following the 6133/PRMT1 transplant before MS023 treatment?

      Yes, the data was in figure 1f, demonstrating that the penetrance is 100%.  

      (4) The 6133/PRMT1 cells show elevated glycolysis compared to parental 6133; why did the author choose the 6133 cells for treatment with the MS023 and ECAR assay (Fig.3 b)? The same is confusing with OCR after inhibitor treatment in 6133 cells; the figure legend and results section description are inconsistent.

      Sorry for the mistakes while we are preparing the manuscript.  We used 6133/PRMT1 cells to be treated with MS023 in figure 4.

      (5) The discussion is too brief and incoherent and does not adequately address key findings. A comprehensive rewrite is necessary to improve coherence and depth.

      We agree with the reviewer. Now we added comprehensive review of PRMT1-mediated metabolism. The PRMT1 homolgous in yeast is called hmt1. In yeast, hmt1 is upregulated by glucose and enhance glycolysis. So PRMT1 enhanced glycolysis is a conserved pathway in eukaryocytic cells.

      (6) The materials and methods section lacks a description of statistical analysis, and significance is not indicated in several figures (e.g., Figures 1C, D, F; Figures 2D, E, F, I). Statistical significance must be consistently indicated. The methods section requires more detailed descriptions to enable replication of the study's findings.

      We have added extra details on the methods and statistical analysis for the figures.

      (7) Figures are hazy and unclear. They should be replaced with high-resolution images, ensuring legible text and data.

      We have prepared separate figure files with high resolution.

      (8) Correct the labeling in Figure 2I by removing the redundant "D."

      We would like to thank the reviewer and fixed the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Strengths:

      The genetic approaches here for visualizing the recombination status of an endogenous allele are very clever, and by comparing the turnover of wildtype and mutant cells in the same animal the authors can make very convincing arguments about the effect of chronic loss of pu.1. Likely this phenotype would be either very subtle or nonexistent without the point of comparison and competition with the wildtype cells.

      Using multiple species allows for more generalizable results, and shows conservation of the phenomena at play.

      The demonstration of changes to proliferation and cell death in concert with higher expression of tp53 is compelling evidence for the authors' argument.

      Weaknesses:

      This paper is very strong. It would benefit from further investigating the specific relationship between pu.1 and tp53 specifically. Does pu.1 interact with the tp53 locus? Specific molecular analysis of this interaction would strengthen the mechanistic findings.

      We agree with the reviewer’s assessment regarding the significance of the relationship between PU.1 and TP53. To investigate the potential interaction between Pu.1 and Tp53 in zebrafish, we analyzed the promoter region of zebrafish tp53. Indeed, we found three PU.1 binding sites (GAGGAA) on tp53 promoter, which locate on the antisense strand from position -1047 to -1042, -1098 to -1093 and -1423 to -1418 relative to the transcriptional start site (Fig. S10). These potential Pu.1 binding sites indicate a direct interaction between Pu.1 and tp53 locus. Furthermore, a previous study by Tschan et al. (2008) elucidated the mechanism by which PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family through direct binding to the DNA-binding and/or oligomerization domains of p53/p73 proteins. We have also cited this study (Line 399-401) and included all above information in the discussion of the revised manuscript (Line 399-405).

      Reviewer #2 (Public review):

      Strengths:

      Generation of an elegantly designed conditional pu.1 allele in zebrafish that allows for the visual detection of expression of the knockout allele.

      The combination of analysis of pu.1 function in two model systems, zebrafish and mouse, strengthens the conclusions of the paper.

      Confirmation of the functional significance of the observed upregulation of tp53 in mutant microglia through double mutant analysis provides some mechanistic insight.

      Weaknesses:

      (1) The presented RNA-Seq analysis of mutant microglia is underpowered and details on how the data was analyzed are missing. Only 9-15 cells were analyzed in total (3 pools of 3-5 cells each). Further, the variability in relative gene expression of ccl35b.1, which was used as a quality control and inclusion criterion to define pools consisting of microglia, is extremely high (between ~4 and ~1600, Figure S7A).

      We feel sorry for the unclearness of RNAseq procedures and have accordingly added the details about RNA-seq data analysis in the “Material and methods” section (Line 491-501). Briefly, reads were aligned to the zebrafish genome using the STAR package. Original counts were calculated with featureCounts package. Differential expression genes (DEGs) were identified with the DESeq2 package. Owing to the technical challenge of unambiguously distinguishing microglia from dendritic cells (DCs) in brain cell suspensions, we employed a strategy of isolating 3-5 cells per pool and quantifying the relative expression of the microglia-specific marker ccl34b.1 normalized to the DC-specific marker ccl19a.1. This approach aimed to reduce DC contamination in downstream analyses. Across all experimental groups subjected to RNA-seq analysis, the ccl34b.1/ccl19a.1 expression ratios exceeded 5, confirming microglia as the dominant cell population. Nonetheless, residual DC contamination in the RNA-seq data cannot be entirely ruled out. We have discussed this technical constraint in the revised manuscript to ensure methodological transparency (Line 498-501).

      (2) The authors conclude that the reduction of microglia observed in the adult brain after cKO of pu.1 in the spi-b mutant background is due to apoptosis (Lines 213-215). However, they only provide evidence of apoptosis in 3-5 dpf embryos, a stage at which loss of pu.1 alone does lead to a complete loss of microglia (Figure 2E). A control of pu.1 KI/d839 mutants treated with 4-OHT should be added to show that this effect is indeed dependent on the loss of spi-b. In addition, experiments should be performed to show apoptosis in the adult brain after cKO of pu.1 in spi-b mutants as there seems to be a difference in the requirement of pu.1 in embryonic and adult stages.

      We apologize for the omission of data regarding conditional pu.1 knockout alone in the embryos in our manuscript, which may have led to ambiguity. We would like to clarify that conditional pu.1 knockout alone at the embryonic stage does not induce microglial death (Fig S2). Microglial death occurs only in both embryonic and adult brains when Pu.1 is disrupted in the spi-b mutant background. The blebbing morphology of some microglia after pu.1 conditional knockout in adult spi-b mutant indicated microglia undergo apoptosis at both embryonic and adult stages (Figure S4 and Fig. S5). The reviewer’s concern likely arises from the distinct outcomes of global pu.1 knockout (Fig. 2) versus conditional pu.1 ablation (Fig. S2). Global knockout eliminates microglia during early development due to Pu.1’s essential role in myeloid lineage specification. We have included this clarification in the revised manuscript (Line 208-211).

      (3) The number of microglia after pu.1 knockout in zebrafish did only show a significant decrease 3 months after 4-OHT injection, whereas microglia were almost completely depleted already 7 days after injection in mice. This major difference is not discussed in the paper.

      We propose that zebrafish Pu.1 and Spi-b function cooperatively to regulate microglial maintenance, analogous to the role of PU.1 alone in mice. This cooperative mechanism likely explains the observed difference in microglial depletion kinetics between zebrafish and mice following pu.1 conditional knockout. Specifically, the compensatory activity of Spi-b in zebrafish may buffer the immediate loss of Pu.1, whereas in mice, the absence of Spi-b expression in microglia eliminates this redundancy, resulting in rapid microglial depletion. Furthermore, during evolution, SPI-B appears to have acquired lineage-specific roles, becoming absent in microglia. We have included the clarification in the revised manuscript (Line 302-305).

      (4) Data is represented as mean +/-.SEM. Instead of SEM, standard deviation should be shown in all graphs to show the variability of the data. This is especially important for all graphs where individual data points are not shown. It should also be stated in the figure legend if SEM or SD is shown

      We have represented our data as mean ± SD in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor:

      To further strengthen the manuscript, we ask the authors to address the reviewers' comments through additional experiments where necessary. In cases where certain experiments may be challenging, we encourage the authors to address these concerns within the text, such as by referencing any prior evidence of pu.1 and tp53 interactions or incorporating in silico analyses that support such interaction.

      As suggested, we have performed in-silico analysis of Pu.1 binding sites in zebrafish tp53 promoter and also cited previous paper showing how PU.1 attenuates the transcriptional activity of the P53 tumor suppressor family (Line 399-405).

      Reviewer #1 (Recommendations for the authors):

      It would be useful to investigate the relationship between pu.1 and tp53. The data presented here show that pu.1 deficient cells have higher expression of tp53, but this could be an indirect effect. However, since pu.1 has known DNA binding motifs, it would be worthwhile to investigate if there are any direct interactions between pu.1 and the tp53 locus -- does pu.1 directly bind and repress tp53 expression? This could be directly investigated with Cut & Run or an EMSA.

      The interaction between Pu.1 and Tp53 has been discussed in the public review section.

      The paper would likely also benefit from a more in-depth discussion of the relationship of the zebrafish alleles and their relationship to mammalian Pu.1 -- as presented here, the authors are implicitly arguing that zebrafish pu.1 and spi-b are both more closely related to mammalian Pu.1 than to mammalian Spi-b. A clear argument, perhaps backed up by sequence alignment and homology matching, would help readers, especially those less familiar with zebrafish genome duplications.

      We have conducted detailed sequence alignment in our previous work (Yu et al., 2017, Blood) and found zebrafish Spi-b shares the highest similarity with the mammalian SPI-B among Ets family transcription factors in zebrafish. A unique P/S/T-rich region known to be essential for mammalian SPI-B transactivation activity is present in zebrafish Spi-b. Our data do not support the interpretation that Spi-b is more closely related to mammalian Pu.1 than to Spi-b. Instead, functional compensation between pu.1 and spi-b in microglia maintenance likely reflects their shared role as Ets-family transcriptional regulators, rather than ortholog-driven redundancy.

      Reviewer #2 (Recommendations for the authors):

      (1) The nomenclature of the genes in the SPI family in zebrafish is somewhat confusing as genes were renamed several times. It would make it easier for the reader to understand if in the abstract and the main text, spi-b would be referred to as the zebrafish orthologue of mouse SPI-B (as determined by the authors in previous work) rather than the paralogue of zebrafish pu.1. To clarify which genes were analyzed in both zebrafish and mouse, Gene accession numbers should be added.

      Thanks for the recommendations. We have changed “the paralogue of zebrafish pu.1” to “the orthologue of mouse Spi-b” in the abstract (Line 22) and added gene accession numbers for both zebrafish and mouse gene (Line 105-106 and 301-302).

      (2) Methods RNA-seq: Details on how the aligned reads were analyzed to detect differentially expressed genes are missing and should be added. In addition, a table with read counts, fold changes and adjusted p values should be added.

      We have added details of RNA-seq analysis in the Material and Methods part (Line 491-501). A table generated by Deseq2 has been included as a supplemental file to show read counts, fold changes and adjusted p values (Supplemental file 2).

      (3) Figure 2H: It would be helpful to the reader if the KO splicing would be shown in comparison to WT splicing.

      Thank you for your suggestion. We have added the sequence result between exon 3 and exon 4 of pu.1 from wildtype cDNA to show WT splicing in Figure 2H.

      (4) Legend Figure 5C. Relative expression should be replaced with transcripts per million (TPM).

      We have corrected it in the figure legend of Figure 5C (Line 786-787).

      (5) In Figure S3. the label on the y-axis in panel B is not visible.

      We apologize for the mistake during figures assembling. We have corrected it and now the y-axis is visible.

      (6) In Figure S7B an explanation for the colors in the heat map is missing and should be added.

      Colors represent scaled TPM values. The red color represents high expression while the blue color represents low expression. We have added the information in the figure legend.

      (7) A justification for the use of male mice only should be added or additional experiments in female mice should be performed.

      Female mice were excluded to avoid variability associated with estrous cycle-dependent hormonal changes, which are known to influence microglial behavior (Habib P et al., 2015). We have added a justification in the revised manuscript (Line 547-548).

      (8) The manuscript would benefit from some language editing. A few examples are listed below:

      a) line 97: the rostral blood (RBI) should read the rostral blood island.

      b) line 373 typo: nucleus translocation should read nuclear translocation.

      c) line 393 typo: pu.1-dificent should read pu.1-deficient.

      We apologize for the typos or grammar mistakes in the manuscript. We have checked the manuscript thoroughly and revised those typos or grammar mistakes.

      Reference:

      Tschan MP, Reddy VA, Ress A, Arvidsson G, Fey MF, Torbett BE (2008) PU.1 binding to the p53 family of tumor suppressors impairs their transcriptional activity. Oncogene 27: 3489-93

      Yu T, Guo W, Tian Y, Xu J, Chen J, Li L, Wen Z (2017) Distinct regulatory networks control the development of macrophages of different origins in zebrafish. Blood 129: 509-519

      Habib P, Beyer C (2015) Regulation of brain microglia by female gonadal steroids. J Steroid Biochem Mol Biol 146: 3-14

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that 1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and 2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      The authors answered the most of concerns I raised. However, the critical issue remains unresolved.

      I am still not convinced by the results presented in Fig. 6 and their interpretation. Since Clozapine acts as an agonist in the absence of an endogenous agonist, it may stimulate the D5R-cAMP-Kv1 pathway. Stimulation of this pathway should abolish the pause response mediated by thalamic stimulation in SCINs, rather than restoring the pause response. Clarification is needed regarding how Clozapine reduces D5R-ligand-independent activity in the absence of dopamine (the endogenous agonist). In addition, the author's argued that D5R antagonist does not work in the absence of dopamine, therefore solely D5R antagonist didn't restore the pause response. However, if D5R-cAMP-Kv1 pathway is already active in L-DOPA off state, why D5R antagonist didn't contribute to inhibition of D5R pathway? Since Clozapine is not D5 specific and Clozapine experiments were not concrete, I recommend testing whether other receptors, such as the D2 receptor, contribute to the Clozapine-induced pause response in the L-DOPA-off state.

      Thank you for the opportunity to clarify this point. It seems there may have been a misunderstanding regarding our proposal about clozapine's mechanism of action. We are not suggesting that clozapine acts as an agonist, but rather as an “inverse agonist”. Unlike classical agonists, inverse agonists produce a pharmacological effect opposite to that of an agonist. Although clozapine is best known for its antagonistic effects on dopamine and serotonin receptors, under conditions where no endogenous agonist is present, it has been shown to reduce the constitutive activity of D1 and D5 receptors (PMID: 24931197). This is explained in lines 240-254 in the Results section.

      In contrast, the prototypical and selective D1/D5 receptor antagonist SCH23390 does not exhibit inverse agonist properties and would not be expected to produce effects in the absence of an agonist (PMID: 7525564). The observation that SCH23390 blocks the effects of clozapine in dopamine-depleted animals strongly supports the idea that clozapine acts through D1/D5 receptors. This is now clarified in lines 257264.

      To further address your comments, we now include a new figure (Figure 6) presenting experiments that show D2-type receptor agonists do not restore the pause response in dyskinetic mice in the off-L-DOPA condition. These results are described in a new subsection of the Results section and discussed in a newly added paragraph in the Discussion (lines 369-380).

      Finally, to exclude a potential contribution of serotonin receptors to clozapine’s effects, we have expanded what is now Figure 7 (formerly Figure 6) to show that clozapine continues to restore the pause response even in the presence of a serotonin receptor antagonist in the bath.

      All these results are further discussed in lines 342-360.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al. presents the role of D5 receptors (D5R) in regulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their findings provide a compelling model explaining the "on/off" switch of the CIN pause, driven by the distinct dopamine affinities of D2R and D5R. This mechanism, coupled with varying dopamine states, is likely critical for modulating synaptic plasticity in cortico-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of the pause response in LID mice and demonstrates the restore of the pause through D1/D5 inverse agonism.

      Strengths:

      The study presents solid findings, and the writing is logically structured and easy to follow. The experiments are well-designed, properly combining ex vivo electrophysiology recording, optogenetics, and pharmacological treatment to dissect / rule out most, if not all, alternative mechanisms in their model.

      Weaknesses:

      While the manuscript is overall satisfying, one conceptual gap needs to be further addressed or discussed: the potential "imbalance" between D2R and D5R signaling due to the ligand-independent activity of D5R in LID. Given that D2R and D5R oppositely regulate CIN pause responses through cAMP signaling, investigating the role of D2R under LID off L-DOPA (e.g., by applying D2 agonists or antagonists, even together with intracellular cAMP analogs or inhibitors) could provide critical insights. Addressing this aspect would strengthen the manuscript in understanding CIN pause loss under pathological conditions.

      Thank you for your comments. Although our primary focus is on the role of D5 receptors, we have also investigated the effects of two D2-type receptor agonists in dyskinetic mice in the off-L-DOPA condition. We found that neither quinpirole nor sumanirole was able to restore the pause response. These results are presented in Figure 6 and related text in the Results and Discussion sections.

      Understanding why D2 agonists fail to restore the pause response—despite their expected effect of reducing cAMP levels—is an important question that warrants further investigation. Interestingly, previous studies have reported paradoxical effects of D2 receptor stimulation in SCINs in animal models of dystonia (PMID: 16934985, PMID: 21912682), as well as under conditions where the SCIN’s constitutively active integrated stress response is diminished (PMID: 33888613). This is now discussed in lines 369-380.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by a SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause, and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism: It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burstdependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for the opportunity to clarify these points. We acknowledge that the response of SCINs to optogenetic stimulation of thalamic afferents in brain slices represents a model system that may not capture all aspects of TAN responses to behaviorally salient events. Nevertheless, this approach allows us to test mechanistic hypotheses that are difficult to address in behaving animals with current technologies. This is now stated in lines 311-314.

      Importantly, our ex vivo preparation reproduces, for the first time, the loss of TAN responses observed in non-human primates with parkinsonism, enabling investigation of the underlying mechanisms. In line with your suggestion, we have expanded the Discussion (third and fourth paragraphs) to address the sources of variability in pause responses.

      (2) Terminology: The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      Thank you for raising this important point. We agree that it is essential to be precise in describing the nature of the pause observed in our ex vivo model. While we believe that readers would recognize from the abstract and methods that our study focuses on a model of the pause response, we understand your concern about potential confusion. In response, we have revised the terminology in the abstract, bullet points, and throughout the manuscript to more clearly reflect that we are describing an ex vivo model of the pause observed in behaving animals.

      (3) Kv1 Blocker Specificity: It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause?

      Clarification on this point would strengthen the interpretation of the results.

      This issue is addressed in lines 147-150.

      (4) Role of D1 Receptors: While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Figure 3C shows that the D1/D5 receptor antagonist SCH23390 does not modify the pause, while the full D1/D5 agonist SKF81297 abolishes it, indicating that in our slice preparation, baseline dopamine levels are not contributing to the pause through D1/D5 receptor stimulation.

      (5) Clozapine's Mechanism of Action: The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      As explained in our response to Reviewer #1, the effect of clozapine is blocked by the D1/D5-selective antagonist SCH23390. Additionally, new data presented in Figure 7C show that clozapine's ability to restore the pause response is maintained even in the presence of a broad-spectrum serotonin receptor antagonist. Since SCINs do not significantly express D1 receptors, we believe that these findings strongly support a role for D5 receptors in SCINs.

      Comments on revisions:

      The authors have addressed many of my concerns. However, I remain unconvinced that adding an 'ex vivo' experiment fully resolves the fundamental differences between the burst-dependent pause observed in slices - defined by the duration of a single AHP - and the pause response in CHINs observed in vivo, which may involve contributions from more than one prolonged AHP. In vivo, neurons can still fire action potentials during the pause, albeit at a lower frequency. Moreover, in behaving animals, pause duration does not vary with or without initial excitation. The mechanism proposed demonstrates that the pause duration, defined by the length of a single AHP, is positively correlated with preceding burst activity.

      As discussed in paragraphs 3 and 4 of the Discussion (starting at line 285), and illustrated in Figure 1J–K, our data show that the duration of the pause can be modulated by rebound excitation from thalamic input. The absence of this rebound allows us to observe a longer pause when more spikes are elicited during the initial excitatory phase, providing a clearer readout of the contribution of intrinsic membrane mechanisms. We do not claim that intrinsic mechanisms alone account for the entire phasic response of SCINs in behaving animals (lines 295-303 in Discussion).

      To improve clarity, I recommend using the term 'SCIN pause' to describe the ex vivo findings, distinguishing them more explicitly from the 'pause response' observed in behaving animals. This distinction would help contextualize the ex vivo findings as potentially contributing to, but not fully representing, the pause response in vivo.

      We did changes in the abstract, bullet points, and main text to clarify that we are not studying the in vivo response.

      Again, it would be helpful to present raw data for pause durations rather than relying solely on ratios. This approach would provide the audience with a clearer understanding of the absolute duration of the burst-dependent pause and allow for better comparison to the ~200 ms pause observed in behaving animals.

      Thank you for your comment. Following your suggestion, we provide the average pause durations for the data shown in Figure 1H (lines 127–130). We opted not to include raw pause durations in the main text for all figures, as this would make the manuscript more difficult to read and, in our view, is unnecessary. The figures already allow readers to estimate absolute durations: in each case, pause durations are shown relative to baseline ISIs in one panel, while the corresponding absolute ISIs are shown side-by-side. This provides a clearer understanding of pause magnitude relative to the cell’s spontaneous firing, which is more informative than absolute values alone, since one would expect a pause to be longer than the average ISI. Please note that baseline ISI are significantly shorter in dyskinetic mice (Figure 5l). Showing the pause duration relative to baseline ISI allows readers to readily compare results across figures regardless of changes in SCIN baseline firing rate.

      Additionally, it is important to note that, in vivo, pause durations are typically inferred from perievent time histograms (PETHs), which represent population averages across many trials. In contrast, in our ex vivo studies, we measured pause duration on a trial-by-trial basis. This approach enables us to analyze how the pause duration varies as a function of the initial burst size in the same neuron—something not typically reported in in vivo studies. As described in the first two paragraphs of the Results, the same SCIN may respond with a different number of spikes in successive trials, and this variability is influenced by factors such as the timing of the last spontaneous spike relative to stimulation onset (Figure 1D–F). We are not aware of studies reporting trial-by-trial analyses of pause duration in behaving animals, particularly in relation to the strength of initial excitation. Therefore, while our slice preparation may yield pause durations that are longer than those observed in vivo, direct comparison to PETH-derived pause durations from behaving animals is not straightforward.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We are appreciative of the reviewers’ and editors’ constructive suggestions of manuscript, which have helped us to improve our manuscript. We have made considerable revisions to our details of data analyses.

      The reason that the reviews did not change is that there were really three central points that led to the "incomplete". These were (1) the fact that there was potentially a selection bias due to double dipping, and (2) there was potentially a time-confound due to the lack of counterbalancing (3) There is confusion about how the modeling was done, but it seems like the modelling was of the complete block (rather than tied to specific events in that block).

      (1) Double dipping

      We appreciate the opportunity to explain our robust safeguards against double-dipping and have provided detailed clarifications regarding the data analyses (pp.11-14).Our study ensures statistical independence between task-related region selection and hypothesis testing through three orthogonal mechanisms:

      (1) Regressor Orthogonality:Statistical Independence Between Selection and Testing

      The selection regressor (group mean activation) was mathematically independent from test regressors (group differences, behavioral scores). This was confirmed through our GLM implementation: First-level: Task vs. rest contrast (β values) for each participant; Second-level: One-sample t-tests (selection) vs. independent group/behavioral tests.

      (2) Multimodal Validation: Complementary Neural and Behavioral Measures

      We employed multiple distinct metrics to provide convergent yet independent validation of effects.

      Neural Measures: Three orthogonal indices assessed different neural dimensions.

      A. Single-brain activation examines neural activity patterns within individual decision-makers,

      B. while within-group neural synchronization (GNS) quantifies the temporal alignment of neural activity across interacting group members during shared decision processes.

      C. Functional connectivity (FC) analyses, by contrast, measure correlated activity between different brain regions within individual participants.

      Behavioral Safeguards: Behavioral metrics were analyzed in independent regressions, avoiding circularity.

      A. Individual performance was based on personal accuracy,

      B. collective performance represented the group-level average accuracy across raters, and

      C. their similarity was quantified as the Euclidean distance between individual and collective scores.

      (3) Statistical Safeguards

      We further ensured independence by applying strict FDR correction at both selection (p < 0.05) and testing stages (p < 0.05). Besides, permutation test was conducted, we tested 1,000 pseudo-group iterations for GNS null distributions.

      Drawing on both classic and latest NIRS (e.g., Jiang et al., 2015; Liu et al., 2023; Stolk et al., 2016; Xie et al., 2023) and NIRS hyperscanning studies (e.g., Liu et al., 2019; P’arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), we performed the data analyses. Below, we provide the details of our data analysis:

      Single-brain activation. To identify task-related brain regions (channels), we used a one-sample t-test based on brain activation data from all participants during the task compared to the baseline (resting state).

      (1)  Data Collection: Each participant had brain activation data (HbO signals measured by fNIRS) during the task (the entire process of reading, sharing, discussing, and decision-making) and the resting state (baseline).

      (2)  Pre-processing: We sought to explore the neural mechanisms that manipulated group identification and its effect on collective performance. Data were preprocessed using the Homer2 package in MATLAB 2020b (Mathworks Inc., Natick, MA, USA). First, motion artifacts were detected and corrected using a discrete wavelet transformation filter procedure. After that, the raw intensity data were converted to optical density (OD) changes. Then, kurtosis-based wavelet filtering (Wav Kurt) was applied to remove motion artifacts with a kurtosis threshold of 3.3 (Chiarelli, Maclin, Fabiani, & Gratton, 2015). Based on a prior multi-brain study of social interactions (Cheng et al., 2022), the output was bandpass filtered using a Butterworth filter with order 5 and cut-offs at 0.01 and 0.5 Hz to remove longitudinal signal drift and instrument noise. Finally, OD data were converted to HbO concentrations.

      (3) Individual-Level Analysis: First, a GLM was used to compute the "task vs. rest" brain activation contrast for each participant [0,1], obtaining each individual's "task effect" value (β value, representing task activation strength).

      (4) Group-Level Analysis: These "task effect" values from all participants were then aggregated, and a one-sample t-test was performed for each brain region (or channel) to determine whether the average activation in that region was significantly greater than 0 (i.e., significantly more active during the task compared to the resting state).

      (5) Task-Related Regions: If the t-test result for a brain region was significant (p < 0.05, FDR-corrected), we considered that region "task-related" and suitable for further analysis.

      (6) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: We analyzed relationships between task-related activation (β values) and behavioral scores (e.g., individual performance) using Pearson analyses.

      - Mediation model: We examined the relationship between an individual's perceived group identification and individual performance, which was mediated by task-related activation (β values).

      Within-Group Neural Synchronization (GNS).

      (1) Data Collection and Pre-processing as above

      (2) Calculation: WTC was applied to generate the brain-to-brain coupling of each pair in each triad (Coherence1&2, Coherence 1&3, and Coherence 2&3). Then, three coherence values from three pairs were averaged as the GNS for each triad, that is, GNS = (Coherence 1&2 + Coherence 1&3 + Coherence 2&3) / 3.

      (3) Task-Related Regions: Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.

      (4) Permutation test: The nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples.

      (5) Subsequent Tests:

      - Group Comparisons: We examined differences in activation between groups (e.g., high vs. low group identification) using independent t-tests on the same task vs. baseline contrast.

      - Behavioral Correlations: The Pearson’s correlation between GNS and collective performance (i.e., calculated by averaging the individual scores assigned by the three raters for each group) was performed.

      -  Mediation model: We examined how GNS mediated the relationship between group identification and collective performance.

      The brain activation connectivity.

      (1) Data Collection and Pre-processing as above

      (2) Calculation: Exploratory Pearson’s correlations between individual performance related HbO and collective performance-related HbO.

      (3) Moderation analysis: Single-brain activation × connectivity → GNS.

      (2) Counterbalancing.

      We sincerely appreciate this valuable methodological insight. Building on prior group decision-making research (De Wilde et al., 2017; Stasser et al., 1992), we refined all stages to enhance experimental control and procedural clarity throughout the process (i.e., a. Reading information, b. Sharing private information, c. Discussing information, d. Decision) (Xie et al., 2023). Importantly, we maintained a fixed task sequence to preserve ecological validity, as this progression mirrors natural group decision-making dynamics.

      While this design choice precludes sequential counterbalancing, several factors mitigate potential temporal confounds: (1) random assignment and uniform task timing across conditions minimize systematic between-group differences; (2) our whole-block GLM approach captures sustained decision-related neural activity rather than phase-specific effects; and (3) We fully acknowledge this limitation and will incorporate a detailed discussion of temporal considerations in the revised manuscript, while noting that our design provides unique advantages for studying naturalistic decision-making processes.

      (3) The modelling was of the complete block

      In our revised manuscript, we have explicitly stated that the analysis was performed at the block level rather than the event level, for the following reasons:

      (1) The hidden profile task is inherently a “group decision-making process” that unfolds dynamically across multiple stages (reading, sharing, discussing, and deciding). Prior research in this paradigm (De Wilde et al., 2017; Stasser & Titus, 1985; Xie et al., 2023) has consistently treated these phases as integrated blocks because the key cognitive and social processes (e.g., information integration, deliberation, and consensus formation) occur over extended interactions rather than discrete events.

      (2) Methodologically, our fNIRS hyperscanning approach requires longer blocks to reliably capture the slow hemodynamic response and the gradual emergence of inter-brain neural synchronization during naturalistic social exchanges (Cui et al., 2012; Liu et al., 2019). Event-related designs, while useful for transient stimuli, are less suited for studying prolonged, interactive decision-making where neural coupling develops over time. Thus, our block-based analysis aligns with both the cognitive demands of the task and the neuroimaging constraints, ensuring robust detection of group-level neural dynamics.

    1. Author response:

      Reviewer 1:

      The selection of heavy metal stress as the condition to investigate is not speculative. The elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., 2009). Differences concordant with its ancient origin identified chromosomal regions of low nucleotide variability that contained the three domestication loci included in this study; all three are involved in heavy-metal detoxification. Results presented in Vielle-Calzada et al 2009 indicated that environmental changes related to heavy metal stress were important selective forces acting on maize domestication. Our study expands those results by starting to elucidate the function of these heavy metal response genes and their role in the evolutionary transition from teosinte parviglumis to maize.

      Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciate the value of this comment. We will modify the manuscript to clearly show which phenotypic observations are novel and which were previously reported for maize grown under HM stress. The rationale for focusing on three specific loci is related to results from Vielle-Calzada et al. 2009 (see comment above). Although we demonstrated that these three loci show unusual reduction in genetic variability when compared to the rest of chromosome 5 – including a separate class of genes previously identified as being affected by domestication (Hufford et al., 2012) -, we will expand the genetic and expression analysis to all genes included in a region precisely defined via LOD scores of five QTL 1.5-LOD support intervals that overlap with ZmHMA1.Within this region of 1.5 to 2 Mb, we will compare nucleotide variability and gene expression in response to HMs. Contrary to major domestication loci showing a single highly pleiotropic gene responsible for important domestication traits, in this chr.5 genomic region phenotypic effects are due to multiple linked QTLs (Lemmon and Doebley, 2014). The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication; we can anticipate that the results reinforce the conclusions of this study.

      The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We agree that lower nucleotide diversity values in maize are not sufficient to infer human selection and could be due to other evolutionary processes. As a matter of fact, since these same HM response loci also show unusually low nucleotide variability in teosinte parviglumis (Fig 2), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native teosinte parviglumis populations in the early Holocene, before maize emergence. This possibility supports a speculative model suggesting that phenotypic changes caused by HM stress could have preceded human selection and its consequences, contributing to initial subspeciation; the model is proposed in the “Ideas and Speculation” section of the manuscript. Fortunately, as suggested by the reviewer, a large body of paleoclimatic records and paleoenvironmental data is available for the Trans-Mexican Volcanic Belt  in the Holocene, including geographic regions where the emergence of maize presumably occurred. We will include an extensive analysis of available paleoenvironmental data and discuss it at the light of our current results regarding the effects of HM stress. We are also expanding the physical range of our statistical analysis to cover at least 60 Kb per locus - including neighboring genes for all three loci - to determine if our results could be due to narrow locus selection.

      Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Although real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, we will explore the possibility of complementing our analysis with available RNA-seq results that are pertinent for this study (see for example Li et al., 2022 and Zhang et al., 2024) and further explore causative effects between HM stress, Tb1 and ZmHMA1 expression. As also pointed by Reviewer#1, TEs are known to influence gene expression under abiotic stress and RNA-Seq analysis would allow to determine if TE activity could lead to similar outcomes.

      Reviewer #2:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We will clearly emphasize our hypothesis that HM stress was an important factor driving the emergence of maize from teosinte parvglumis through action of HM response genes. A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of volcanic soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation.

      Although real-time qPCR is an accurate and reliable approach to assess gene expression, we agree that RNA-Seq results would improve the scope of the analysis and better assess the role of Tb1 in relation to HM response (see comments for Reviewer#1). There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). We will emphasize these phenotypic changes in a modified version of the manuscript. There is a possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor; however, if such is the case, we think it is rather unlikely that the real driving factor could have acted through mechanisms not related to abiotic stress or HM response. To address the possibility that HM stress was a cofounding factor, we will extensively analyze genetic diversity and gene expression in all loci containing genes mapping in close proximity to peak LOD scores of all 1.5-LOD support intervals located in chromosome 5 and showing pleiotropic effects on domestication traits (Lemmon and Doebley, 2014). These will also include those mapping in close proximity to ZmHMA1. The potential influence of heavy metals in the field is being investigated through the analysis of paleoenvironmental data (see response to Reviewer#1); we will include our results in a modified version of the manuscript.

      We thank both reviewers for their detailed revision the manuscript and their pertinent recommendations to improve its presentation and reading.

      References:

      Hufford, Matthew B., Xun Xu, Joost Van Heerwaarden, Tanja Pyhäjärvi, Jer-Ming Chia, Reed A. Cartwright, Robert J. Elshire, et al. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44(7): 808-11.

      Lemmon Zachary H., Doebley John F. 2014. Genetic dissection of a genomic region with pleiotropic effects on domestication traits in maize reveals multiple linked QTL. Genetics 198(1): 345-353.

      Lin Kaina, Zeng Meng, Williams Darron V., Hu Weimin, Shabala Sergey, Zhou Meixue, Cao Fangbin, et al. 2022. Integration of transcriptome and metabolome analyses reveals the mechanictic basis for cadmium accumulation in maize. iScience 25(12): 105484.

      Vielle-Calzada JP, De La Vega OM, Hernández-Guzmán G, Ibarra-LacLette E, Alvarez-Mejía C, Vega-Arreguín JC, Jiménez-Moraila B, Fernández-Cortés A, Corona-Armenta G, Herrera-Estrella L, Herrera-Estrella A. 2009. The Palomero genome suggests metal effects on domestication. Science 326: 1078.

      Zhang Mengyan, Zhao Lin, Yun Zhenyu, Wu Xi, Wu Qi, et al. 2024. Comparative transcriptome analysis of maize (Zea mays L.) seedlings in response to copper stress. Open Life Sciences 19(1): 20220953.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors build on their previous study that showed the midgut microbiome does not oscillate in Drosophila. Here, they focus on metabolites and find that these rhythms are in fact microbiome-dependent. Tests of time-restricted feeding, a clock gene mutant, and diet reveal additional regulatory roles for factors that dictate the timing and rhythmicity of metabolites. The study is well-written and straightforward, adding to a growing body of literature that shows the time of food consumption affects microbial metabolism which in turn could affect the host.

      We thank the reviewer for the positive comments.

      Some additional questions and considerations remain:

      (1) The main finding that the microbiome promotes metabolite rhythms is very interesting. Which microbiota are likely to be responsible for these effects? The author's previous work in this area may shed light on this question. Are specific microbiota linked to some of the metabolic pathways investigated in Figure 5?

      This is a good question. Although the Drosophila microbiome shows limited diversity, comprised largely of two major families (Acetobacteraceae and Lactobacillaceae), effects on the host could arise from just a subset of species within these families. However, identifying these would require inoculating microbiome-free flies with single and mixed combinations of species and conducting metabolomics to examine cycling of each of the three categories of metabolites we studied-- primary, lipids and biogenic amines (each of these may respond differently to different species). We believe this is beyond the scope of this manuscript, which is focused on how cycles in these different types of metabolites change in the context of the microbiome, the circadian clock and different diets.

      (2) TF increases the number of rhythmic metabolites in both microbiome-containing and abiotic flies in Figure 1. This is somewhat surprising given that flies typically eat during the daytime rather than at night, very similar to TF conditions. I would have assumed that in a clock-functioning animal, the effect of restricting food availability should not make a huge difference in the time of food consumption, and thus downstream impacts on metabolism and microbiome. Can the authors measure food intake directly to compare the ad-lib vs TF flies to see if there are changes in food intake? Would restricting feeding to other times of day shift the timing of metabolites accordingly?

      Previous studies have indicated that there is no significant difference in food consumption between ad-lib and TF flies (Gill et al., 2015; Villaneuva et al 2019). We also found that the presence of a microbiome does not alter total food consumption when compared with germ-free flies (Zhang et al, 2023, and current manuscript). Though flies primarily feed during the day, some food consumption occurs at night (i.e the feeding rhythm is not tight) and so restricting food to the daytime can increase metabolite cycling. Restricting feeding to other times of day is expected to shift metabolite cycling. We previously showed that this shifts transcript cycling (Xu et al, Cell Metabolism 2011)

      (3) In Figure 2, Per loss of function reveals a change in the phase of rhythmic metabolites. In addition, the effect of the microbiome on these is very different = The per mutants show increased numbers of rhythmic metabolites when the microbiome is absent, unlike the controls. Is it possible that these changes are due to altered daily feeding rhythms in per mutants? Testing the time and amount of food consumed by the per mutant flies would address this question. Would TF in the per mutants rescue their metabolite rhythms and make them resemble clock-functioning controls?

      We previously showed that per<sup>01</sup> flies lose feeding rhythms in DD and LD conditions, but consume a lot more food (Barber et al, 2021). Given that locomotor rhythms are maintained in per<sup>01</sup> in LD (Konopka and Benzer 1971), these rhythms or other rhythms driven by LD cues likely account for the maintenance of metabolite rhythms. And the increased food consumption may contribute to the changes observed. To address the reviewer’s question about the microbiome, we assayed feeding rhythms in per<sup>01</sup> in the absence/presence of a microbiome on the diets that haven’t been tested before (high sugar and high protein diet). Surprisingly, feeding was rhythmic on a high protein diet, regardless of whether a microbiome was present (new Figure S10). On a high sugar diet, feeding appears to be somewhat rhythmic in the presence of a microbiome (although not significant) and not when the microbiome is absent. The same is true in iso31 controls, and in all cases, the phase is the same. Despite the similar effect of the microbiome on feeding rhythms in wild type and per<sup>01</sup>, the effect on cycling is very different. Thus, feeding rhythms do not appear to explain the effects of the microbiome on metabolite cycling in per<sup>01</sup>.

      (4) The calorie content of each diet-normal vs high protein vs high-sugar are different. The possibility of a calorie effect rather than a difference in nutrition (protein/carbohydrate) should be discussed. Another issue worth considering is the effect of high protein/sugar on the microbiome itself. While the microbiome doesn't seem to affect rhythms in the high-protein diet, the high-sugar diet seems highly microbiome-dependent in Supplementary Fig 8C vs D. Does the diet impact the microbiome and thus metabolite rhythmicity downstream?

      Thank you for these good suggestions. We have added to the discussion the possibility that caloric content, rather than nutrition (protein/carbohydrate), affects metabolite cycling in flies fed normal vs. high-protein vs. high-sugar diets. We have also discussed the possibility that effects of different diets on metabolite cycling are mediated by changes in the microbiome. We cite papers that show effects of diet on microbiomes.

      (5) It would be good if a supplementary table was provided outlining the specific metabolites that are shown in the radial plots. It is not clear if the rhythms shown in the figures refer to the same metabolites peaking at the same time, or rather the overall abundance of completely different metabolites. This information would be useful for future research in this area.

      We have added a supplementary Table 1-21 which includes all the raw metabolites.

      Reviewer #2 (Public Review):

      Summary:

      The paper addresses several factors that influence daily changes in concentration of metabolites in the Drosophila melanogaster gut. The authors describe metabolomes extracted from fly guts at four time-points during a 24-hour period, comparing profiles of primary metabolites, lipids, and biogenic amines. The study elucidates that the percentage of metabolites that exhibit a circadian cycle, peak phases of their increased appearance, and the cycling amplitude depends on the combination of factors (microbiome status, composition or timing of the diet, circadian clock genotype). Multiple general conclusions based on the data obtained with modern metabolomics techniques are provided in each part of the article. Descriptive analysis of the data supports the finding that microbiome increases the number of metabolites for which concentration oscillates during the day period. Results of the experiments show that timed feeding significantly enhanced metabolite cycling and changed its phase regardless of the presence of a microbiome. The authors suggest that the host circadian rhythm modifies both metabolite cycling period and the number of such metabolites.

      Strengths:

      The obvious strength of the study is the data on circadian cycling of the detected 843, 4510, and 4330 total primary metabolites, lipids, and biogenic amines respectively in iso31 flies and 623, 2245, and 2791 respective metabolites in per<sup>01</sup> mutants. The comparison of the abundance of these metabolites, their cycling phase, and the ratio of cycling/non-cycling metabolites is well described and illustrated. The conditions tested represent significant experimental interest for contemporary chronobiology: with/without microbiota, wild-type/mutant period gene, ad libitum/time-restricted feeding, and high-sugar/high-protein diet. The authors conclude that the complex interaction between these factors exists, and some metabolic implications of combinations of these factors can be perceived as reminiscent of metabolic implications of another combination ("...the microbiome and time-restricted feeding paradigms can compensate for each other, suggesting that different strategies can be leveraged to serve organismal health"). Enrichment analysis of cycling metabolites leads to an interesting suggestion that oscillation of metabolites related to amino acids is promoted by the absence of microbiota, alteration of circadian clock, and time-restricted feeding. In contrast, association with microbiota induces oscillation of alpha-linolenic acid-related metabolites. These results provide the initial step for hypothesising about functional explanations of the uncovered observations.

      We thank the reviewer for summarizing the contributions made by this manuscript.

      Weaknesses:

      Among the weaknesses of the study, one might point out too generalist interpretations of the results, which propose hypothetical conclusions without their mechanistic proof. The quantitative indices analysed are obviously of particular interest, however are not self-explaining and exhaustive. More specific biological examples would add valuable insights into the results of this study, making conclusions clearer. More specific comments on the weaknesses are listed below:

      (1) The criterion of the percentage of cycling metabolites used for comparisons has its own limitations. It is not clear, whether the cycling metabolites are the same in the guts with/without microbiota, or whether there are totally different groups of metabolites that cycle in each condition. GO enrichment analysis gives only a partial assessment, but is still not quantitative enough.

      Microbiome-containing flies and germ-free flies do share some cycling metabolites. Figure 6 provides GO analysis for the pathways enriched in each condition, and Figure S6 shows quantitative data on the number that overlap between different conditions. We have also expanded discussion of specific cycling groups (e.g. amino acid metabolism) to indicate that different metabolites of the same pathway may cycle under different conditions. In addition, we have added detailed information for all cycling metabolites in Supplemental Tables 1-21.

      (2) The period of cycling data is based on only 4 time points during 24 hours in 4 replicates (>200 guts per replicate) on the fifth day of the experiment (10-12-day-old adults). It does not convincingly prove that these metabolites cycle the following days or more finely within the day. Moreover, it is not clear how peaks in polar histogram plots were detected in between the timepoints of ZT0, ZT6, ZT12, and ZT18.

      We acknowledge these limitations, but note that these experiments are very challenging because of the amount of tissue/guts needed for each data point and the time it takes to dissect each gut. Thus, getting more closely spaced time points is difficult. And we believe the detection of daily peaks with four biological replicates provides good evidence for cycling. The peaks in polar histogram plots are based on the parameter of JTK_adjphase when conducting JTK cycle analysis; as the data are averaged across replicates, the average can sometimes fall in between two assayed time points. Details can be found in the attached Supplementary Tables.

      (3) Average expression and amplitude are analysed for groups of many metabolites, whereas the data on distinct metabolites is hidden behind these general comparisons. This kind of loss of information can be misleading, making interpretation of the mentioned parameters quite complicated for authors and their readers. Probably more particular datasets can be extracted to be discussed more thoroughly, rather than those general descriptions.

      We analyzed groups of metabolites, dividing them into primary metabolites, lipids and biogenic amines, to extract general take-home messages and also to simplify the presentation. Figure 6 demonstrates specific pathways whose cycling is affected in each condition assayed. And Figure S11 shows examples of cycling metabolites under different conditions. To highlight a dataset that is altered under different conditions, we expanded our discussion of amino acid metabolism, and show how the specific metabolites that cycle in this pathway may vary from one condition to another (Figure S11). For more quantitative data on individual metabolites, we now provide supplementary tables that display all the cycling metabolites. These include those uniquely cycling in one group, those shared between both two groups, and those uniquely cycling in the other group.

      (4) The metabolites' preservation is crucial for this type of analysis, and both proper sampling plus normalisation require more attention. More details about measures taken to avoid different degradation rates, different sizes of intestines, and different amounts of microbes inside them will be beneficial for data interpretation.

      We were careful to control for gut size and to preserve the samples so as to minimize degradation (We placed all the fly samples on ice during collection, and the entire dissection process was also conducted on ice. Once the gut sample collection was completed, we immediately transferred the samples to dry ice for storage. After we finished collecting all the samples, we stored them at -80°C). In general, gut sizes varied in the following order: females fed high-protein diets >females fed normal chow diets> female flies fed high-sugar diets. As the metabolomic facility suggested 10mg samples for each biological repeat, we dissected at least 180 female guts from flies fed high-protein diets, 200 female guts from flies fed normal chow diets, and at least 250 female guts from flies fed high-sugar diets. Also, as gut sizes were smaller in sterile flies, relative to microbiome-containing flies, on a high protein diet, we collected 200 guts from sterile flies under these conditions. Finally, the service that conducted the metabolomics (UC Davis) provided three detailed files to describe the extraction process for primary metabolites, lipids, and biogenic amines, respectively. We have submitted these files as supplemental materials in the revised manuscript.

      (5) The data in the article describes formal phenomena, not directly connected with organism physiology. The parameters discussed obviously depend on the behavior of flies. Food consumption, sleep, and locomotor activity could be additionally taken into account.

      These are very interesting suggestions. Previous results indicated that microbiome-containing flies do not change their total food consumption or exhibit changes in feeding rhythms when compared with germ-free flies (Zhang et al., 2023), which indicates that microbiome-mediated metabolite cycling is independent of feeding rhythms. As noted above, we examined the contribution of feeding to metabolite cycling in per<sup>01</sup> flies, and did not see an obvious link. We also assayed feeding rhythms and overall food consumption in wild type under AS and AM conditions and on different diets, and likewise could not account for changes in metabolite cycling based on altered food intake (new Figure S10). We acknowledge that behavior, including locomotor activity and sleep, could indeed influence metabolite cycling. We have added discussion of this.

      (6) Division of metabolites into three classes limits functional discussion of found differences. Since the enrichment analysis provided insights into groups of metabolites of particular interest (for example, amino acid metabolism), a comparison of their cycling characteristics can be shown separately and discussed.

      The intent of this work was to provide an overall account of changes in metabolite cycling that occur under different conditions/diets/genotypes. We have expanded the discussion of amino acid metabolism and show how different metabolites of this pathway cycle under different conditions (Figure S11). We believe that discussion/analysis of other specific groups would be good follow-up studies, which can build upon this work. Detailed datasets about all cycling metabolites are provided in Table S1-12.

      Reviewer #3 (Public Review):

      Summary:

      The authors. sought to quantify the influence of the gut microbiome on metabolite cycling in a Drosophila model with extensive metabolomic profiling over a 24-hour period. The major strength of the work is the production of a large dataset of metabolites that can be the basis for hypothesis generation for more specific experiments. There are several weaknesses that make the conclusions difficult to evaluate. Additional experiments to quantify food intake over time will be required to determine the direct role of the microbiome in metabolite cycling.

      Strengths:

      An extensive metabolomic dataset was collected with treatments designed to determine the influence of the gut microbiome on metabolite circadian cycling.

      Weaknesses:

      (1) The major strength of this study is the extensive metabolomic data, but as far as I can tell, the raw data is not made publicly available to the community. The presentation of highly processed data in the figures further underscores the need to provide the raw data (see comment 3).

      The raw data have been submitted to the public metabolite database. https://www.ebi.ac.uk/metabolights/. (ID: MTBLS8819)

      In addition, the normalized metabolite data have been added in the supplemental materials.

      (2) Feeding times heavily influence the metabolome. The authors use timed feeding to constrain when flies can eat, but there is no measurement of how much they ate and when. That needs to be addressed.

      Since food is the major source of metabolites, the timing of feeding needs to be measured for each of the treatment groups. In the previous paper (Zhang et al 2023 PNAS), the feeding activity of groups of 4 male flies was measured for the wildtype flies. That is not sufficient to determine to what extent feeding controls the metabolic profile of the flies. Additionally, timed feeding opportunities do not equate to the precise time of feeding. They may also result in dietary restriction, leading to the loss of stress resistance in the TF flies. The authors need to measure food consumption over time in the exact conditions under which transcriptomic and metabolomic cycling are measured. I suggest using the EX-Q assay as it is much less effort than the CAFE assay and can be more easily adapted to the rearing conditions of the experiments.

      As noted above, we have now added considerable additional data on feeding and feeding rhythms in microbiome-containing and sterile wild type and per<sup>01</sup> flies on different diets (Figure S10). Our previous study, using the EX-Q assay method, found no differences in either total food consumption or feeding rhythms between microbiome-containing flies and germ-free flies (Zhang et al., 2023). Also, previous work has demonstrated that there is no significant difference in food consumption between ad-lib and TF flies (Villaneuva et al 2019).

      (3) The data on the cycling of metabolites is presented in a heavily analyzed form, making it difficult to evaluate the validity of the findings, particularly when a lack of cycling is detected. The normalization to calculate the change in cycling due to particular treatments is particularly unclear and makes me question whether it is affecting the conclusions. More presentation of the raw data to show when cycling is occurring versus not would help address this concern, as would a more thorough explanation of how the normalization is calculated - the brief description in the methods is not sufficient.

      For instance, the authors state that "timed feeding had less effect on flies containing a microbiome relative to germ-free flies." One alternative interpretation of that result is that both treatments are cycling but that the normalization of one treatment to the other removes the apparent effect. This concern should be straightforward to address by showing the raw data for individual metabolites rather than the group.

      We have added Supplement Table1-21 that includes detailed information on metabolite identity and data processing. Also, we have included the raw data, encompassing all the cycling metabolites, in the Supplement Table1-21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The abstract could be rewritten to clarify. I found the last part of the introduction better but struggled to understand the abstract.

      We apologize for this. The abstract was indeed quite dense; we have revised it for clarity.

      (2) Supplementary Figure 8 could be moved to the main text. Since all the comparisons are on one page it is much easier to see the similarities and differences in the conditions tested.

      We have moved Supplementary Figure 8 to main Figure 5.

      (3) The sex and age of the flies used in all experiments should be clarified. The authors mention female guts are collected in the methods (line 111) but it is not clear if this is throughout.

      All guts used in this study were female. We have clarified this in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Some minor notes that might be improved:

      (1) The order of obtaining eggs without microbiota might be different (first - bleaching, second - sterilisation with ethanol). Otherwise, it is not clear why dechorionating is needed after sterilisation.

      Protocols for generating axenic flies vary. We used the method Feltzin et al reported in 2019: “For newborn fly embryos (<12 hours). First, cleanse and sterilize any leftover agar from collection plates using 100% ethanol, second, dechorionate the fly embryos with 10% bleach, and then immediately rinse three times in germ-free PBS”.

      (2) References for the resources used might be provided (MetaboAnalyst5.0, JTK_CYCLEv3.1).

      We have added the reference for MetaboAnalyst5.0, JTK_CYCLEv3.1 (Pang et al., 2022)

      (3) References or justification for the chosen composition of the diets might be useful (standard diet, high-protein diet, high-sugar diet).

      We have added the references (Bedont et al, 2021, Morris et al, 2021).

      (4) Justification of the choice of iso31 line and per<sup>01</sup> mutant might be important.

      iso31 is the standard wild type line we use in the laboratory. To understand the role of the endogenous clock in microbiome-mediated metabolite cycling, we chose the classical canonical clock mutant per<sup>01</sup> as this displays fewer non-circadian phenotypes seen. For instance, loss of transcriptional activators of the clock produces additional effects (e.g. hyperactivity), likely because of the effect it has on overall expression of many genes. We have added this explanation to the manuscript.  

      (5) Abbreviation decoding might be introduced when it is used for the first time in the text (line 240 - TM, TS).

      We apologize for this omission and have rectified it. Thanks

      TM (timed feeding microbiome-containing flies)

      TS (timed feeding germ-free flies)

      (6) The term "germ-free" is recommended to be avoided in the context of the paper (germ-free = infertile for animals). It might be replaced with the terms "without microbiota" or "germ-free" for example.

      Given that the reviewer recommends use of the word “germ-free” in the second sentence, we assume that the first sentence was intended to say we should avoid “sterile” (and not “germ-free”). We have edited to “germ-free” in the manuscript.

      (7) When only one diet is assumed, it might be better to say so (line 324 - "the protein diet" instead of "protein diets").

      Sorry, we have edited accordingly.

      (8) Too many speculative conclusions are confusing (line 476 - what does it mean for "just as” - how exactly similar; line 477 - what kind of "compensation"; line 503 - how exactly it is related to "metabolic homeostasis" and to which kind of homeostasis).

      “just as” was not intended to refer to any degree of similarity but only to the fact that amino acid cycling occurs in the absence of a clock, as it does in the absence of a microbiome. We speculate that this “compensates” for something that is normally conferred by the clock and the microbiome, for instance maybe the clock drives cycling of a microbiome component that is important for protein metabolism. In the absence of either the clock or the microbiome, this is compensated for by amino acid cycling. We have clarified in the text.

      We used the term "metabolic homeostasis" to reflect steady maintenance of metabolic health via interaction and modulation of different factors. As in the case of the example given above for amino acid metabolism, a perturbation of one process might produce a change in another to optimize metabolism. We have changed the wording in the text to better convey our message (lines 576-579)

      (9) Particular examples of metabolites might be beneficial for supporting conclusions (a figure which shows, for instance, the specific data on linolenic acid: in which conditions it cycles, in which not, what is the period of cycling, what are the exact expression and JTK_amplitude values).

      All cycling metabolites, including linolenic acid, are now included in the supplemental tables.

      Reviewer #3 (Recommendations For The Authors):

      (1) The level of biological replication is unclear for the metabolomic experiments. I see that 200 guts per sample were collected and 4 repeat samples were made at each timepoint. Are these 4 biological replicates for each treatment (AS, AM, TS, TM) at each timepoint? 5 replicates are standard in metabolomics. Please be more explicit in the methods.

      There are 4 biological replicates for each time point of each of the 4 treatments. The metabolomics service recommended 4-6 replicates, so we prepared 4 replicates for each sample. As noted above, these preparations are quite difficult. We found that in general the biological replicates do not differ significantly from each other.

      (2) Wolbachia can have a significant influence on fly physiology. How was this variable addressed? Were flies checked for Wolbachia?

      All the flies are Wolbachia-free, as in our previous study (Zhang et al., 2023). Initially, we treated the flies with 1 mM kanamycin (11815024, ThermoFisher) to remove bacteria. Afterwards, we repopulated the flies with a Wolbachia-free microbiome containing Lactobacillus and Acetobacter bacteria from a medium previously occupied by other flies.

      (3) In Results section 1, the authors report changes in the percentages of metabolites that are cycling, but no statistical test is presented to show that these changes are indeed significant. The authors need to report statistics on the percentages of cycling metabolites.

      We used statistical tests, specifically JTK cycle, to determine cycling of each metabolite. The P value for cycling of each metabolite in this test is computed on the basis of all the biological replicates and all time points. Metabolites that showed a significant P value contribute to the percent cycling. As a result, there is only one value for the percentage cycling in each condition. Thus, statistical analysis cannot be done.

      (4) The authors report that the species proportions in the gut microbiome don't cycle, but do absolute CFU counts? By many accounts (see e.g. Blum et al 2013 mBio), the gut microbiome in lab flies is what they recently ate from the food. The abundance of bacteria in the gut would then be directly tied to the timing of feeding. Timed feeding should produce oscillations in individual flies, so individual flies should be analyzed.

      We assume the reviewer is suggesting that rhythmic feeding could result in rhythmic abundance of the microbiome, which could contribute to cycling. This is indeed a possibility and one we now discuss in the manuscript. Thanks! Analysis of the gut microbiome in individual flies would require quantitation of CFUs from single guts. We do not believe a single gut would yield enough material.

      (5) Line 252: the ZT9 peak could just be due to feeding and digestion.

      This is possible. We now acknowledge this

      (6) What is the expectation for metabolite cycling in per mutant flies? Shouldn't per mutant flies have no cycling on average? Does the cycling suggest there is an external factor causing cycling?

      Under light-dark conditions, metabolite cycling in per mutant flies may be driven by light: dark cues, either directly or through other light-driven rhythms e,g. locomotor activity is rhythmic in per<sup>01</sup> flies maintained in LD.

      (7) Performing food intake analysis on each of the treatments would provide critical data to address the direct role of the microbiome in metabolite cycling.

      As noted above, we now provide considerable additional data on food intake at different times of day in microbiome-containing and germ-free wild type and per<sup>01</sup> flies on different diets (Figure S11). Overall, our data indicate that food intake or feeding rhythms do not account for the effects we report here.

      (8) Please be more explicit about replication in the methods and figure legends.

      We have added n=4 for each condition in the methods and figure legends.

      (9) There are numerous minor grammatical errors such as incorrect verb tenses and usage of articles. Additional proofreading could correct these.

      Sorry! We have done a thorough proofreading and made corrections.

    1. Author response:

      Reviewer #1 (Public Review):

      Insects, such as bees, are surprisingly good at recognizing visual patterns. How they achieve this challenging task with limited computational resources is not fully understood. Based on the actual bee's behaviour and visual circuit structure, MaBouDi et al. constructed a biologically plausible model where the circuit extracts essential visual features from scanned natural scenes. The model successfully discriminated a variety set of visual patterns as the actual bee does. By implementing a type of Hebb's rule for non-associative learning, an early layer of the model extracted orientational information from natural scenes essential to pattern recognition. Throughout the paper, the authors provided intuitive logic for how the relatively simple circuit could achieve pattern recognition. This work could draw broad attention not only in visual neuroscience but also in computer vision.

      We appreciate your positive feedback.

      However, there are a number of weaknesses in the manuscript. 1) The authors claim that the model is inspired by micromorphology, yet it does not rigorously follow the detailed anatomy of the insect brain revealed as of now. 2) Some claims sound a bit too strong compared to what the authors demonstrated with the model. For example, when the authors say the model is minimal, the authors simply investigated how many lobula neurons are required for pattern discrimination in the model. However, the manuscript appears to use this to claim that the presented model is the minimal one required for visual tasks. 3) It lacks explanations of what mechanisms in the model could discriminate some patterns but not others, making the descriptions very qualitative. 4) The authors did not provide compelling evidence that the algorithm is particularly tuned to natural scenes.

      We appreciate the reviewer's constructive feedback and have revised the manuscript to clarify and strengthen our claims. Below, we address each of the concerns raised:

      (1) The model does not rigorously follow the detailed anatomy of the insect brain

      We acknowledge that our model is an abstraction rather than a direct reproduction of the full micromorphology of the insect brain. The goal of our study was not to replicate every anatomical feature but rather to extract the core computational principles underlying active vision, based on the functional activity of insect brain. Although the recent connectome studies provide detailed structural maps, they do not fully capture the functional dynamics of sensory processing and behavioural outcomes. Our model integrates key neurobiological insights, including the hierarchical structure of the optic lobes, lateral inhibition in the lobula, and non-associative learning mechanisms shaping spatiotemporal receptive fields.

      However, to address this concern, we have revised the introduction and discussion to explicitly acknowledge the model’s level of abstraction and its relationship to the known anatomy of the insect visual system. Furthermore, we highlight future directions in which connectomic data could refine our model.

      (2) Strength of claims regarding minimality of the model

      We appreciate the reviewer’s concern regarding the definition of a "minimal" model. Our intention was not to claim that this model represents the absolute minimal neural architecture for visual learning task but rather that it identifies a minimal set of necessary computational elements that enable pattern discrimination in insects. To clarify this, we have refined the text to ensure that our conclusions about minimality are explicitly tied to the specific constraints and assumptions of our model. For instance, in the revised manuscript, we emphasise that our findings demonstrate how the number of lobula neurons, inhibitory lateral connection, non-associative learning model, affect neural representation and discrimination performance, rather than establishing an absolute lower bound on the complexity required for visual processing in insects.

      (3) Mechanistic explanations for pattern discrimination

      Thank you for highlighting this point. We have conducted a more detailed analysis of the model’s response to different patterns and expanded our discussion of the underlying mechanisms. To address this, we have refined our explanation of how different scanning strategies and temporal integration mechanisms contribute to neural selectivity in the lobula and overall discrimination performance. Specifically:

      - Figure 3 illustrates how the model benefits from generating sparse coding in the visual network, leading to improved performance in pattern recognition tasks.

      - Figure 5 now includes a more detailed explanation of how different scanning strategies influence the selectivity and separability of lobula neuron responses. Additionally, we provide further analysis of why the model successfully discriminates certain patterns (e.g., simple oriented bars) but struggles with more complex spatially structured quadrant-based patterns.

      - We elaborate on how sequential sampling, temporal coding, and lateral inhibition collectively shape neural representations, enabling the model to distinguish between visual stimuli effectively.

      These refinements provide a clearer mechanistic explanation of the model’s strengths and limitations, ensuring a more comprehensive understanding of its function.

      (4) Evidence that the model is tuned to natural scenes

      We have revised the manuscript to provide stronger support for the claim that the model is particularly adapted to natural scenes. Specifically:

      - Figure 3 demonstrates that training on natural images leads to sparse, decorrelated responses in the lobula, a hallmark of efficient coding observed in biological systems.

      - Supplementary Figure 2-1B shows that training with shuffled images fails to produce structured receptive fields, reinforcing that the statistical structure of natural images is crucial for efficient learning.

      - We now explicitly discuss how the receptive fields emerging from non-associative learning align with known orientation-selective responses in insect visual neurons, supporting the idea that the model is optimised for processing natural visual inputs (Figures 3, 6) and discussion section.

      Taken together, these revisions clarify how the model captures fundamental principles of insect vision without making overly strong claims about biological fidelity. We thank the reviewer for these insightful comments; addressing them has significantly strengthened the clarity and depth of our manuscript.

      Reviewer #2 (Public Review):

      This study is inspired by the scanning movements observed in bees when performing visual recognition tasks. It uses a multilayered network, representing stages of processing in the visual lobes (lamina, medulla, lobula), and uses the lobula output as input to a model of associative learning in the mushroom body (MB). The network is first trained with short "scanning" sequences of natural images, in a non-associative adaptation process, and then several experimental paradigms where images are rewarded or punished are simulated, with the output of the MB able to provide the appropriate discriminative decisions (in some but not all cases). The lobula receptive fields formed by the initial adaptation process show spatiotemporal tuning to edges moving at particular orientations and speeds that are comparable to recorded responses of such neurons in the insect brain.

      There are two main limitations to the study in my view. First, although described (caption fig 1) as a model "inspired by the micromorphology" of the insect brain, implying a significant degree of accuracy and detail, there are many arbitrary features (unsupported by current connectomics). For example, the strongly constrained delay line structure from medulla to­ lobula neurons, and the use of a single MB0N that has input synapses that undergo facilitation and decay according to different neuromodulators. Second, while it is reasonable to explore some arbitrary architectural features, given that not everything is yet known about these pathways, the presented work does not sufficiently assess the necessity and sufficiency of the different components, given the repeated claims that this is the "minimal circuit" required for the visual tasks explored.

      We appreciate your feedback and have refined the manuscript to clarify model design choices and address concerns regarding minimality.

      (1) Model Architecture and Functional Simplifications<br /> While our model is inspired by insect visual system, it is not intended as an exact anatomical reconstruction but rather a functional abstraction to uncover key computational principles of active vision and visual learning. The delay-line structure and simplified MBON implementation were deliberate choices to enable spatiotemporal encoding and associative learning without overcomplicating the model. As connectome data alone do not fully reveal functional relationships, our approach serves as a hypothesis-generating tool for future neurobiological studies.

      (2) Necessity and Sufficiency of Model Components<br /> We have removed overstatements about minimality and now clarify that our model represents a functional circuit rather than the absolute minimal configuration. Additionally, we conducted new control experiments assessing the influence of different model components, and further justifying key mechanisms such as spatiotemporal encoding and lateral inhibition.

      For a more detailed discussion of these revisions and improvements, please refer to our response to the Journal, above.

      Regarding the mushroom body (MB) learning model, it is strange that no reference is made to recent models closely tied to connectomic and other data in fruit flies, which suggests separate MBONS encode positive vs. negative value; that learning is not dependent on MB0N activity (so is not STDP); that feedback from MBONs to dopaminergic signalling plays an important role, etc. Possibly the MB of the bee operates in a completely different way to the fly, but the presented model relies on relatively old data about MB function, mostly from insects other than bees (e.g. locust) so its relationship to the increasingly comprehensive understanding emerging for the fly MB needs to be clarified. It is implied that the complex interaction of the differential effects of dopamine and octopamine, as modelled here, are required to learn the more complex visual paradigms, but it is not actually tested if simpler rules might suffice. Also, given previous work on models of view recognition in the MB, inspired by bees and ants, it seems plausible that simply using static 25×25 medulla activity as input to produce sparse activity in the KCs would be sufficient for MB0N output to discriminate the patterns used in training, including the face stimulus. Thus it is not clear whether the spatiotemporal input and the lobula encoding are necessary to solve these tasks.

      Thank you for your suggestion. The primary focus of this study was not to uncover the exact mechanisms of associative learning in the mushroom body (MB) but rather to evaluate the role of lobula output activity in active vision. The associative learning component was included as a simplified mechanism to assess how the spatiotemporal encoding in the lobula contributes to visual pattern learning.

      We conducted a detailed analysis of lobula neuron activity, focusing on sparsity, decorrelation, and selectivity to demonstrate how the visual system extracts compact yet relevant signals before reaching the learning centre (see Figure 5). Theoretical predictions based on these findings suggest that such encoding enhances pattern recognition performance. While selecting this possible associative learning mechanism allowed us to explicitly evaluate this capability, it also facilitated comparison with previous active vision experiments and assessed the influence of different components on bee behaviour.

      We acknowledge that recent Drosophila connectomics studies suggest alternative MB architectures, including separate MBONs encoding positive vs. negative values, learning mechanisms independent of MBON activity, and feedback from MBONs to dopaminergic pathways. However, visual learning mechanisms in the MB remain poorly characterised, especially in bees, where the functional relevance of different MBON configurations is still unclear. The decision to simplify the MB learning process was intentional, allowing us to prioritise model interpretability over anatomical replication.

      These simplifications have been explicitly discussed in the revised manuscript, where we suggest future directions for integrating more biologically detailed MB models to enhance our understanding of active visual learning in insects. For a broader discussion of our rationale for prioritising computational simplifications over direct neurobiological replication, please refer to our response to the Journal, above.

      It is also difficult to interpret the range of results in fig 3. The network sometimes learns well, sometimes just adequately (perhaps comparable to bees), and sometimes fails. The presentation of these results does not seem to identify any coherent pattern underlying success or failure, other than that the ability to generalise seems limited. That is, recognition (in most cases) requires the presentation of exactly the same stimulus in exactly the same way (same scanning pattern, distance and speed). In particular, it is hard to know what to conclude when the network appears able to learn some "complex patterns" (spirals, faces) but fails to learn the apparently simple plus vs. multiplication symbol discrimination if it is trained and tested with a scan passing across the whole pattern instead of just the lower half.

      We acknowledge that the variability in the model’s performance across different tasks and conditions required a clearer explanation. In the revised manuscript, we have analysed the underlying factors influencing success and failure in greater detail and have expanded the discussion on the model’s generalisation limitations.

      To address this, we have conducted new control experiments and deeper analyses, now presented in Figure 5, Figure 6F, which illustrate how scanning conditions impact recognition performance. Specifically, we examine why the model can successfully learn complex patterns (e.g., spirals, faces) but struggles with apparently simpler tasks, such as distinguishing between a plus and multiplication symbol when scanning the entire pattern rather than just the lower half. Our results suggest that spatially constrained scanning enhances discriminability, while whole-pattern scanning reduces selectivity due to weaker and less sparse feature encoding in lobula neurons.

      We have also clarified in the Discussion section that while the model demonstrates robust pattern learning under specific conditions, its ability to generalise remains limited when tested with compex patterns (Figure 6F. Further investigation is needed to explore how adaptive scanning strategies or hierarchical processing might improve generalisation.

      In summary, although it is certainly interesting to explore how active vision (scanning a visual pattern) might affect the encoding of stimuli and the ability to learn to discriminate rewarding stimuli, some claims in the paper need to be tempered or better supported by the demonstration that alternative, equally plausible, models of the visual and mushroom body circuits are not sufficient to solve the given tasks.

      There is limited knowledge in the literature regarding the neural correlates of visual-related plasticity in the mushroom body (MB). The majority of our current understanding of the MB is derived from studies on olfactory learning, particularly in Drosophila, which does not provide sufficient data to directly implement or comprehensively compare alternative models for visual learning.

      However, the primary focus of our study is on active vision and how spatiotemporal signals are encoded in the insect visual system. Rather than aiming to replicate a detailed biological model of MB function, we intentionally employed a simplified associative learning network to investigate how neural activity emerging from our visual processing model can support pattern recognition. This approach also allows us to compare model performance with bee behaviour, drawing on insights from previous experimental work on active vision in bees.

      We now discuss the limitations of our approach and the rationale for selectively incorporating specific neural network components in lines 652-677. Additionally, we have provided further justification (see responses above) for prioritising a simplified model, rather than attempting to mimic a highly detailed, yet currently unverified, alternative learning circuit. These clarifications help ensure that our claims are appropriately tempered while still demonstrating the functional relevance of our model.

      Reviewer #3 (Public Review):

      In this manuscript, the authors use the data collected and observations made on bees' scanning behaviour during visual learning to design a bio-inspired artificial neural network. The network follows the architecture of bees visual systems, where photoreceptors project into the lamina, then the medulla, medulla neurons connect to a set of spiking neurons in the lobula. Lobula neurons project to kenyon cells and then to MBON, which controls reward and punishment. The authors then test the performance of the network in comparison with real bee data, finding it to perform well in all tasks. The paper attempts to reproduce a living organism network with a practical application in mind, and it is quite impressive! I appreciate both the potential implications for the understanding of biological systems and the applications in the development of autonomous agents, making the paper absolutely worth reading.

      Thank you for your positive feedback and appreciation of our work.

      However, I believe that the current version somewhat lacks in clarity regarding the methodology and in some of the keywords used to describe the model.

      Definitions:<br /> Throughout the manuscript, the authors use some key terminology that I believe would benefit from some clarification.<br /> The generated model is described in the title and once in the introduction as "neuromorphic". The model is definitely bio-inspired, but at least in some layers of the neural network, the model is built very differently from actual brain connectivity. Generally, when we use the term neuromorphic we imply many advantages of neural tissue, like energy efficiency, that I am not sure the current model is achieving. I absolutely see how this work is going in that direction, and I also fundamentally agree with the choice of terminology, but this should be clearly explained to not risk over-implications

      We appreciate the reviewer’s feedback and acknowledge the importance of clarifying key terminology in our manuscript. As outlined in our response to the Journal, we intentionally simplified the model to focus on understanding the core computational processes involved in active vision rather than precisely replicating the full complexity of insect neural circuits (see other reasons for simplification in the manuscript). This simplification allows us to systematically analyse the influence of specific components underlying active vision mechanisms.

      Despite these simplifications, our model incorporates key neuromorphic principles, including the use of a recurrent neural network architecture and a spiking neuron model at multiple processing levels. These elements enable biologically inspired information processing, aligning with the fundamental characteristics of neuromorphic computing, even if the model does not explicitly focus on hardware efficiency or energy constraints.

      The authors describe this as a model of "active vision". This is done in the title of the article, and in the many paragraph headings (methods, results). In the introduction, however, the term active vision is reserved to the description of bees' behavior. Indeed, the developed model is not a model of active vision, as this would require for the model to control the movement of the "camera". Here instead the stimuli display is given to the model in a fixed progression. What I suspect is that the authors' aim is to describe a model that supports the bees' active vision, not a model of active vision. I believe this should be very clear from the paper, and it may be appropriate to remove the term from the title.

      While our model does not actively control camera movement in the environment, it does simulate the effects of active vision by incorporating scanning dynamics. Our results demonstrate that model responses change significantly with variations in scanning speed and restricted scanning areas, highlighting the importance of movement in shaping visual encoding. However, we acknowledge that true active vision would involve adaptive, real-time control of gaze or trajectory, which the step after the current implementation for make more realistic model of active vison. To address your concern, we have discussed the potential for incorporating dynamic flight behaviours in future studies, allowing the model to actively adjust its scanning strategy based on learned visual cues.

      In the short title, it said that this network is minimal. This is then characterized in the introduction as the minimal network capable of enabling active vision in bees. The authors, however, in their experiment only vary the number of lobula neurons, without changing other parts of the architecture. Given this, we can only say that 16 lobula neurons is the minimal number required to solve the experimental task with the given model. I don't believe that this is generalizable to bees, nor that this network is minimal, as there may be different architectures (for the other layers especially) that require overall less neurons. Moreover, the tasks attempted in the minimal network experiment did not include any of the complex stimuli presented in figure 3, like faces. It may be that 16 lobula neurons are sufficient for the X vs + and clockwise vs counter-clockwise spirals, but we do not know if increasing stimuli complexity would result in a failure of the model with 16 neurons.

      We agree that analysing only the number of lobula neurons is not sufficient to establish a truly minimal model for active vision. To address this, we conducted further control experiments to evaluate the influence of other key components, including non-associative learning, scanning behaviour, and lateral connectivity, on model performance. Our results suggest that the proposed model represents a computationally minimal network capable of implementing a basic active vision process, but a more complex model would be required for higher-order visual tasks.

      However, to avoid potential misinterpretation, we have revised the short title and updated the manuscript to clarify that our model identifies a possible minimal functional circuit rather than the absolute minimal network for active vision. Additionally, we have added further discussion on the simplifications made in the model and emphasised the need for future studies to explore alternative architectures and assess their relevance for understanding active vision in insects.

      Methodology:

      The current explanation of the model is currently a bit lacking in clarity and details. This risks impacting negatively on the relevance of the whole work which is interesting and worth reading! This issue affects also the interpretation of the results, as it is not clear to what extent each part of the network could affect the results shown. This is especially the case when the network under-performs with respect to the best performing scenario (e.g., when varying the speed and part of the pattern that is observed, such as in Fig 2C). Adding a detailed technical scheme/drawing specific to the network architecture could have been a way of significantly increasing the clarity of the Methods section and the interpretation of the results.<br /> On a similar note, the authors make some comparisons between the model and real bees. However, it remains unclear whether these similarities are actually indicative of an optimality in the bees visual scanning strategy, or just deriving from the authors design. This is for me particularly important in the experiments aimed at finding the best scanning procedure. If the initial model training is based on natural images it is performed by presenting left to right moving frames, the highest efficiency of lower-half scanning may be due to how the weights in the initial layers are structured and a low generalizability of the model, rather than to the strategy optimality

      We appreciate the reviewer’s constructive feedback and have taken steps to enhance the clarity, interpretability, and transparency of our model description and results. Below, we address the concerns regarding model explanation, performance interpretation, and the comparison with real bee behaviour.

      (1) Improved Model Explanation and Network Clarity: We apologise that the previous version of the manuscript did not fully detail the architecture and functioning of the model. To address this, we have expanded the Methods section with a more detailed breakdown of the network components, their roles, and their contribution to active vision processing. Additionally, we have summarised the network architecture and its implementation for visual learning tasks at the beginning of the Results section, providing a clearer overview of the information flow from visual input to associative learning. Furthermore, we have explicitly analysed and discussed the role of key model components, including scanning strategies, lateral connectivity, and non-associative learning mechanisms, clarifying how each contributes to the observed results.

      (2) Interpretation of Model Performance Variability: Understanding the factors influencing performance variability is crucial, and to improve clarity, we have conducted further analysis of model performance across different conditions, particularly examining the effects of scanning speed, spatial constraints, and feature encoding (see Figure 2C). Additionally, we have expanded the discussion on how scanning conditions impact performance, providing explanations for why some conditions lead to higher or lower discrimination success. Furthermore, we have clarified why certain stimuli present greater challenges for the model, linking these difficulties to receptive field properties and scanning dynamics.

      (3) Comparison Between Model Behaviour and Real Bees: To address your concern regarding the link between scanning preferences and true biological optimality, we have included further analysis discussing the influence of training conditions on the model’s learned behaviours. Additionally, we propose future experiments to test alternative scanning strategies, including adaptive scanning mechanisms that adjust based on visual task demands. Furthermore, we have expanded the discussion on the simplifications made in this study, explicitly stating the limitations of the model and emphasising the need for future research to explore more flexible and biologically plausible scanning mechanisms.

      We believe these revisions significantly enhance the clarity and interpretability of the study, ensuring that the model’s findings are well contextualised within both computational and biological frameworks.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Specific comments:

      (1) It is difficult to appreciate that there is a "peripheral sub-membrane microtubule array" as it is not well defined in the manuscript. This reviewer assumes that this is in the respective field clear. Yet, while it is appreciated that there is an increased amount of MTs close to the cytoplasmic membrane, the densities appear very variable along the membrane. Please provide a clear description in the Introduction what is meant with "peripheral sub-membrane microtubule array".

      A definition has been added to the Introduction.

      (2) The authors described a "consistent presence of a significant peripheral array in the C57BL/ 6J control mice, while the KO counterparts exhibited a partial loss of this peripheral bundle.

      Specifically, the measured tubulin intensity at the cell periphery was significantly reduced in the KO mice compared to their wild-type counterparts". In vitro "control cells had convoluted nonradial MTs with a prominent sub-membrane array, typical for β cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse receding MTs at the periphery (Fig. 2B,C)". Please comment/discuss why in vivo there are no "extra-dense MTs in the cell center".

      We now add a discussion of this point, which we believe could be a manifestation of 3D shape of a beta cell in tissue and/or compensatory mechanisms in organisms.

      (3) Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      A paragraph added to the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: Even though the reviewer appreciates that minor changes of MT configuration have severe effects, still the overall effects appear minor (40 vs. <50% or 35% vs. around 28%). Notably, there are no statistically significant differences in the different groups in Fig. 1Suppl-Fig.1 D. This reviewer is not sure if the combination of many not significantly different data points can result in significant changes and this should be checked by a statistician. Authors should include in the Discussion a paragraph discussing the fact that small changes in MT configuration can have strong effects.

      We have now added the requested paragraph to the discussion. Indeed, the differences are small, and the significance is only detected in a data set with a large sample size in Fig. 1J,K (combined data sets with smaller sizes from Fig. 1-Suppl-Fig.1 D), consistent with the fact that a larger sample size generally provides more power to detect an effect.

      (2) Unfortunately, the authors cannot block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block (best inhibiting microtubule formation), to show that the MTs accumulated in the cell center will be transported to the periphery.

      This is indeed the case at the moment, yes.

      Minor comments:

      - Abstract: β-cells vs. β cells (and throughout the manuscript)

      - Page 4: "MTOC, the Golgi, (Trogden et al. 2019), and"

      - Page 5: "β-cell specific"

      - MT-sliding vs. MT sliding

      - Kinesin 1 vs. kinesin-1

      - Page 6, line 1: "β cells. actively"

      - Page 7: "a microtubule probe", should be "MT"

      - Page 9: "1μm" vs. "1 μm"

      - Page 10: "demonstrate a dramatic effect" recommended is: "demonstrate a marked effect"

      - Page 13, line 1: dramatically vs. markedly

      - Page 13, line 5: "50μm" vs. "50 μm" (in general, there should be a space between number and unit?)

      - "37 degrees C" vs. "37{degree sign}C"

      - Animal protocol number?

      - "Mice were euthanized by isoflurane inhalation"? What concentration? How long? More details are needed (no cervical dislocation?).

      - Antibodies: more identifiers are needed.

      - Antibody information in Key reagents and under 5. Reagents and antibodies do not fit (1:500 and 1:1000).

      Thank you, we corrected all relevant information now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      We have examined functional deficits in kif1aa mutants in another paper that was recently accepted: David et al. 2024. https://pubmed.ncbi.nlm.nih.gov/39373584/

      In David et al., we found that in addition to a subtle role in ribbon fusion during development, Kif1aa plays a major role in enriching glutamate-filled synaptic vesicles at the presynaptic active zone of mature hair cells. In kif1aa mutants, synaptic vesicles are no longer enriched at the hair cell base, and there is a reduction in the number of synaptic vesicles associated with presynaptic ribbons. Further, we demonstrated that kif1aa mutants also have functional defects including reductions in spontaneous vesicle release (from hair cells) and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow. Because our current paper focuses on microtubule-associated ribbon movement and dynamics early in hair-cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window. In our revision, we have referenced this recent work. Currently it is challenging to disentangle how the subtle defects in ribbon formation in kif1aa mutants contribute to the defects we observe in ribbon-synapse function.

      Added to results:

      “Recent work in our lab using this mutant has shown that Kif1aa is responsible for enriching glutamate-filled vesicles at the base of hair cells. In addition this work demonstrated that loss of Kif1aa results in functional defects in mature hair cells including a reduction in evoked post-synaptic calcium responses (David et al., 2024). We hypothesized that Kif1aa may also be playing an earlier role in ribbon formation.”

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      These are important strengths and as stated we are currently investigating what other kinesins and adaptors and adaptor’s transport ribbons. We have ongoing work examining how hair-cell activity impacts ribbon fusion and transport!

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point. Previous immunohistochemistry work in mice demonstrated that ribbons and Kif1a colocalize in mouse hair cells (Michanski et al, 2019). Unfortunately, the antibody used in study work did not work in zebrafish. To further investigate this interaction, we also attempted to create a transgenic line expressing a fluorescently tagged Kif1aa to directly visualize its association with ribbons in vivo. At present, we were unable to detect transient expression of Kif1aa-GFP or establish a transgenic line using this approach. While we will continue to work towards understanding whether Kif1aa and ribbons colocalize in live hair cells, currently this goal is beyond the scope of this paper. In our revision we discuss this caveat.

      Added to discussion:

      “In addition, it will be useful to visualize these kinesins by fluorescently tagging them in live hair cells to observe whether they associate with ribbons.”

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 public response weaknesses.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

      This is correct and a caveat of our Kif1aa and drug experiments. In our recently published work, we confirmed that Kif1aa is expressed in hair cells and neurons, while kif1ab is present just is neurons. Therefore, it is likely that the ribbon formation defects in kif1aa mutants are restricted to hair cells. We added this expression information to our results:

      “ScRNA-seq in zebrafish has demonstrated widespread co-expression of kif1ab and kif1aa mRNA in the nervous system. Additionally, both scRNA-seq and fluorescent in situ hybridization have revealed that pLL hair cells exclusively express kif1aa mRNA (David et al., 2024; Lush et al., 2019; Sur et al., 2023).”

      Non-hair cell effects are a real concern in our pharmacology experiments. To mitigate this in our pharmacological experiments, we have performed drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The fast experiments were done after 30 min nocodazole drug treatment, and after this treatment we observed reduced directional motion and fusions. This fast drug treatment should not incur any long-term changes or developmental defects as hair-cell development occurs over 12-16 hrs. However, we acknowledge that drug treatments could have secondary phenotypic effects or effects that are not hair-cell specific. In our revision, we discuss these issues.

      Added to discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30-70 min and 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      We agree that overexpression of transgenes under using a non-endogenous promoter in transgenic lines is an important consideration. Ideally, we would do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. The decrease in precursors is likely not due to regulation by the myo6a promoter. Although the myo6a promoter comes on early in hair cell development, the promoter only gets stronger as the hair cells mature. This would lead to a continued increase rather than a decrease in puncta numbers with development.

      Protein tags such as tagRFP always have the caveat of impacting protein function. This is in partly why we complemented our live imaging with analyses in fixed tissue without transgenes (kif1aa mutants and nocodazole/taxol treatments).

      In our revision, we did perform an immunolabel on myo6b:riba-tagRFP transgenic fish and found that Riba-tagRFP expression did not impact ribbon synapse numbers or ribbon size. This analysis argues that the transgene is expressed at a level that does not impact ribbon synapses. This data is summarized in Figure 1-S1.

      Added to the results:

      “Although this latter transgene expresses Riba-TagRFP under a non-endogenous promoter, neither the tag nor the promoter ultimately impacts cell numbers, synapse counts, or ribbon size (Figure 1-S1A-E).”

      Added to methods:

      Tg(myo6b:ctbp2a-TagRFP)<sup>idc11Tg</sup> reliably labels mature ribbons, similar to a pan-CTBP immunolabel at 5 dpf (Figure 1-S1B). This transgenic line does not alter the number of hair cells or complete synapses per hair cell (Figure 1-S1A-D). In addition, myo6b:ctbp2a-TagRFP does not alter the size of ribbons (Figure 1-S1E).”

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      We did attempt a co-localization analysis between microtubules and ribbons but did not move forward with it due to several issues:

      (1) Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and ultimately, we found that co-localization analyses were not meaningful because the distances were too small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in hair cells due to highly varying filament intensities, filament dynamics and the presence of diffuse cytoplasmic tubulin signal.

      Because of these challenges we concluded the best evidence of ribbon-microtubule association is through visualization of ribbons and their association with microtubules over time (in our timelapses). We see that ribbons localize to microtubules in all our timelapses, including the examples shown (Movies S2-S10). The only instance of ribbon dissociation it when ribbons switch from one filament to another. We did not observe free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      Yes, it is true, that directed transport of ribbon precursors is relatively rare. Only a small subset of the ribbon precursors moves directionally (α > 1, 20 %) or have a displacement distance > 1 µm (36 %) during the time windows we are imaging. The majority of the ribbons are stationary. To emphasize this result we have added bar graphs to Figure 3I,K to illustrate this result and state the numbers behind this result more clearly.

      “Upon quantification, 20.2 % of ribbon tracks show α > 1, indicative of directional motion, but the majority of ribbon tracks (79.8 %) show α < 1, indicating confinement on microtubules (Figure 3I, n = 10 neuromasts, 40 hair cells, and 203 tracks).

      To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells, and 203 tracks).”

      We cannot say for certain what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion sufficient to reach the active zone. This idea is supported by the fact that we see ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses (Movies S4 and S5). It is possible that ribbons that are stationary may not have enough motors attached, or there may be a ‘seeding’ phase where Ribeye aggregates are condensing on the ribbon.

      We also reexamined our MSD a values as the a values we observed in hair cells were lower than those seen canonical motor-driven transport (where a approaches 2). One reason for this difference may arise from the dynamic microtubule network in developing hair cells, which could affect directional ribbon movement. In our revision we plotted the distribution of a values which confirmed that in control hair cells, the majority of the a values we see are typically less than 2 (Figure 7-S1A). Interestingly we also compared the distribution a values between control and taxol-treated hair cells, where the microtubule network is more stable, and found that the distribution shifted towards higher a values (Figure 7-S1A). We also plotted only ‘directional’ tracks (with a > 1) and observed significantly higher a values in taxol-treated hair cells (Figure 7-S1B). This is an interesting result which indicates that although the proportion of directional tracks (with a > 1) is not significantly different between control and taxol-treated hair cells (which could be limited by the number of motor/adapter proteins), the ribbons that move directionally do so with greater velocities when the microtubules are more stable. This supports our idea that the stability of the microtubule network could be why ribbon movement does not resemble canonical motor transport. This analysis is presented as a new figure (Figure 7-S1A-B) and is referred to in the text in the results and the discussion.

      Results:

      “Interestingly, when we examined the distribution of α values, we observed that taxol treatment shifted the overall distribution towards higher α a values (Figure 7-S1A). In addition, when we plotted only tracks with directional motion (α > 1), we found significantly higher α values in hair cells treated with taxol compared to controls (Figure 7-S1B). This indicates that in taxol-treated hair cells, where the microtubule network is stabilized, ribbons with directional motion have higher velocities.”

      Discussion:

      “Our findings indicate that ribbons and precursors show directed motion indicative of motor-mediated transport (Figure 3 and 7). While a subset of ribbons moves directionally with α values > 1, canonical motor-driven transport in other systems, such as axonal transport, can achieve even higher α values approaching 2 (Bellotti et al., 2021; Corradi et al., 2020). We suggest that relatively lower α values arise from the highly dynamic nature of microtubules in hair cells. In axons, microtubules form stable, linear tracks that allow kinesins to transport cargo with high velocity. In contrast, the microtubule network in hair cells is highly dynamic, particularly near the cell base. Within a single time frame (50-100 s), we observe continuous movement and branching of these networks. This dynamic behavior adds complexity to ribbon motion, leading to frequent stalling, filament switching, and reversals in direction. As a result, ribbon transport appears less directional than the movement of traditional motor cargoes along stable axonal filaments, resulting in lower α values compared to canonical motor-mediated transport. Notably, treatment with taxol, which stabilizes microtubules, increased α values to levels closer to those observed in canonical motor-driven transport (Figure 7-S1). This finding supports the idea that the relatively lower α values in hair cells are a consequence of a more dynamic microtubule network. Overall, this dynamic network gives rise to a slower, non-canonical mode of transport.”

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

      When using nocodazole, we worked to optimize the concentration of the drug to minimize cytotoxicity, while still being effective. While the more stable filaments at the cell apex remain largely intact after nocodazole treatment, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We have clarified this in our results. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells expressing YFP-tubulin (Figure 4-S1F-G), that highlight cytoplasmic YFP-tubulin and long, stabilized microtubules after 3-4 hr treatment with nocodazole and taxol respectively. In these images we also point out microtubules at the apical region of hair cells that are very stable and do not completely destabilize with nocodazole treatment at concentrations that are tolerable to hair cells.

      “We verified the effectiveness of our in vivo pharmacological treatments using either 500 nM nocodazole or 25 µM taxol by imaging microtubule dynamics in pLL hair cells (myo6b:YFP-tubulin). After a 30-min pharmacological treatment, we used Airyscan confocal microscopy to acquire timelapses of YFP-tubulin (3 µm z-stacks, every 50-100 s for 30-70 min, Movie S8). Compared to controls, 500 nM nocodazole destabilized microtubules (presence of depolymerized YFP-tubulin in the cytosol, see arrows in Figure 4-S1F-G) and 25 µM taxol dramatically stabilized microtubules (indicated by long, rigid microtubules, see arrowheads in Figure 4-S1F,H) in pLL hair cells. We did still observe a subset of apical microtubules after nocodazole treatment, indicating that this population is particularly stable (see asterisks in Figure 4-S1F-H).”

      To further address concerns about verifying the efficacy of nocodazole and taxol treatment on microtubules, we added a quantification of our immunostaining data comparing the mean acetylated-a-tubulin intensities between control, nocodazole and taxol-treated hair cells. Our results show that nocodazole treatment reduces the mean acetylated-a-tubulin intensity in hair cells. This is included as a new figure (Figure 4-S1D-E) and this result is referred to in the text. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells after overnight treatment with nocodazole and taxol (Figure 4-S1A-C).

      “After a 16-hr treatment with 250 nM nocodazole we observed a decrease in acetylated-a-tubulin label (qualitative examples: Figure 4A,C, Figure 4-S1A-B). Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D). Less acetylated-a-tubulin label indicates that our nocodazole treatment successfully destabilized microtubules.”

      “Qualitatively more acetylated-a-tubulin label was observed after treatment, indicating that our taxol treatment successfully stabilized microtubules (qualitative examples: Figure 4-S1A,C). Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1E).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript is fairly dense. For instance, some information is repeated (page 3 ribbon synapses form along a condensed timeline in zebrafish hair cells: 12-18 hrs, and on .page 5. These hair cells form 3-4 ribbon synapses in just 12-18 hrs). Perhaps, the authors could condense some of the ideas? The introduction could be shortened.

      We have eliminated this repeated text in our revision. We have shortened the introduction 1275 to 1038 words (with references)

      (2) The mechanosensory structure on page 5 is not defined for readers outside the field.

      Great point, we have added addition information to define this structure in the results:

      “We staged hair cells based on the development of the apical, mechanosensory hair bundle. The hair bundle is composed of actin-based stereocilia and a tubulin-based kinocilium. We used the height of the kinocilium (see schematic in Figure 1B), the tallest part of the hair bundle, to estimate the developmental stage of hair cells as described previously…”

      (3) Figure 1E is quite interesting but I'd rather show Figure S1 B/C as they provide statistics. In addition, the authors define 4 stages : early, intermediate, late, and mature for counting but provide only 3 panels for representative examples by mixing late/mature.

      We were torn about which ribbon quantification graph to show. Ultimately, we decided to keep the summary data in Figure 1E. This is primarily because the supplementary Figure will be adjacent to the main Figure in the Elife format, and the statistics will be easy to find and view.

      Figure 1 now provides a representative image for both late and mature hair cells.

      (4.) The ribbon that jumps from one microtubule to another one is eye-catching. Can the authors provide any statistics on this (e.g. percentage)?

      Good point. In our revision, we have added quantification for these events. We observe 2.8 switching events per neuromast during our fast timelapses. This information is now in the text and is also shown in a graph in Figure 3-S1D.

      “Third, we often observed that precursors switched association between neighboring microtubules (2.8 switching events per neuromast, n= 10 neuromasts; Figure 3-S1C-D, Movie S7).”

      (5) With regard to acetyl-a-tub immunocytochemistry, I would suggest obtaining a profile of the fluorescence intensity on a horizontal plane (at the apical part and at the base).

      (6) Same issue with microtubule destruction by nocodazole. Can the authors provide fluorescence intensity measurements to convince readers of microtubule disruption for long and short-term application.

      Regarding quantification of microtubule disruption using nocodazole and taxol. We did attempt to create profiles of the acetylated tubulin or YFP-tubulin label along horizontal planes at the apex and base, but the amount variability among cells and the angle of the cell in the images made this type of display and quantification challenging. In our revision we as stated above in our response to Reviewer #1’s public comment, we have added representative side-view images to show the disruptions to microtubules more clearly after short and long-term drug experiments (Figure 4-S1A-C, F-H). In addition, we quantified the reduction in acetylated tubulin label after overnight treatment with nocodazole and found the signal was significantly reduced (Figure 3-S1D-E). Unfortunately, we were unable to do a similar quantification due to the variability in YFP-tubulin intensity due to variations in mounting. The following text has been added to the results:

      “Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D).”

      “Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1A,C,E).”

      (7) It is a bit difficult to understand that the long-term (overnight) microtubule destabilization leads to a reduction in the number of synapses (Figure 4F) whereas short-term (30 min) microtubule destabilization leads to the opposite phenotype with an increased number of ribbons (Figure 6G). Are these ribbons still synaptic in short-term experiments? What is the size of the ribbons in the short-term experiments? Alternatively, could the reduction in synapse number upon long-term application of nocodazole be a side-effect of the toxicity within the hair cell?

      Agreed-this is a bit confusing. In our revision, we have changed our analyses, so the comparisons are more similar between the short- and long-term experiments–we examined the number of ribbons and precursor per cells (apical and basal) in both experiments (Changed the panel in Figure 4G, Figure 4-S2G and Figure 5G). In our live experiments we cannot be sure that ribbons are synaptic as we do not have a postsynaptic co-label. Also, we are unable to reliably quantify ribbon and precursor size in our live images due to variability in mounting. We have changed the text to clarify as follows:

      Results:

      “In each developing cell, we quantified the total number of Riba-TagRFP puncta (apical and basal) before and after each treatment. In our control samples we observed on average no change in the number of Riba-TagRFP puncta per cell (Figure 6G). Interestingly, we observed that nocodazole treatment led to a significant increase in the total number of Riba-TagRFP puncta after 3-4 hrs (Figure 6G). This result is similar to our overnight nocodazole experiments in fixed samples, where we also observed an increase in the number of ribbons and precursors per hair cell. In contrast to our 3-4 hr nocodazole treatment, similar to controls, taxol treatment did not alter the total number of Riba-TagRFP puncta over 3-4 hrs (Figure 6G). Overall, our overnight and 3-4 hr pharmacology experiments demonstrate that microtubule destabilization has a more significant impact on ribbon numbers compared to microtubule stabilization.”

      Discussion:

      “Ribbons and microtubules may interact during development to promote fusion, to form larger ribbons. Disrupting microtubules could interfere with this process, preventing ribbon maturation. Consistent with this, short-term (3-4 hr) and long-term (overnight) nocodazole increased ribbon and precursor numbers (Figure 6AG; Figure 4G), suggesting reduced fusion. Long-term treatment (overnight) resulted in a shift toward smaller ribbons (Figure 4H-I), and ultimately fewer complete synapses (Figure 4F).”

      Nocodazole toxicity: in response to Reviewer # 2’s public comment we have added the following text in our discussion:

      Discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30 min to 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      (8) Does ribbon motion depend on size or location?

      It is challenging to reliability quantify the actual area of precursors in our live samples, as there is variability in mounting and precursors are quite small. But we did examine the location of ribbon precursors (using tracks > 1 µm as these tracks can easily be linked to cell location in Imaris) with motion in the cell. We found evidence of ribbons with tracks > 1 µm throughout the cell, both above and below the nucleus. This is now plotted in Figure 3M. We have also added the following test to the results:

      “In addition, we examined the location of precursors within the cell that exhibited displacements > 1 µm. We found that 38.9 % of these tracks were located above the nucleus, while 61.1 % were located below the nucleus (Figure 3M).”

      Although this is not an area or size measurement, this result suggests that both smaller precursors that are more apical, and larger precursors/ribbons that are more basal all show motion.

      (9) The fusion event needs to be analyzed in further detail: when one ribbon precursor fuses with another one, is there an increase in size or intensity (this should follow the law of mass conservation)? This is important to support the abstract sentence "ribbon precursors can fuse together on microtubules to form larger ribbons".

      As mentioned above it is challenging accurately estimate the absolute size or intensity of ribbon precursors in our live preparation. But we did examine whether there is a relative increase in area after ribbon fuse. We have plotted the change in area (within the same samples) for the two fusion events in shown in Figure 8-S1A-B. In these examples, the area of the puncta after fusion is larger than either of the two precursors that fuse. Although the areas are not additive, these plots do provide some evidence that fusion does act to form larger ribbons. To accompany these plots, we have added the following text to the results:

      “Although we could not accurately measure the areas of precursors before and after fusion, we observed that the relative area resulting from the fusion of two smaller precursors was greater than that of either precursor alone. This increase in area suggests that precursor fusion may serve as a mechanism for generating larger ribbons (see examples: Figure 8-S1A-B).”

      Because we were unable to provide more accurate evidence of precursor fusion resulting in larger ribbons, we have removed this statement from our abstract and lessened our claims elsewhere in the manuscript.

      (10) The title in Figure 8 is a bit confusing. If fusion events reflect ribbon precursors fusion, it is obvious it depends on ribbon precursors. I'd like to replace this title with something like "microtubules and kif1aa are required for fusion events"

      We have changed the figure title as suggested, good idea.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1C. The purple/magenta colors are hard to distinguish.

      We have made the magenta color much lighter in the Figure 1C to make it easier to distinguish purple and magenta.

      (2) There are places where some words are unnecessarily hyphenated. Examples: live-imaging and hair-cell in the abstract, time-course in the results.

      In our revision, we have done our best to remove unnecessary hyphens, including the ones pointed out here.

      (3) Figure 4H and elsewhere - what is "area of Ribeye puncta?" Related, I think, in the Discussion the authors refer to "ribbon volume" on line 484. But they never measured ribbon volume so this needs to be clarified.

      We have done best to clarify what is meant by area of Ribeye puncta in the results and the methods:

      Results:

      “We also observed that the average of individual Ribeyeb puncta (from 2D max-projected images) was significantly reduced compared to controls (Figure 4H). Further, the relative frequency of individual Ribeyeb puncta with smaller areas was higher in nocodazole treated hair cells compared to controls (Figure 4I).”

      Methods:

      “To quantify the area of each ribbon and precursor, images were processed in a FIJI ‘IJMacro_AIRYSCAN_simple3dSeg_ribbons only.ijm’ as previously described (Wong et al., 2019). Here each Airyscan z-stack was max-projected. A threshold was applied to each image, followed by segmentation to delineate individual Ribeyeb/CTBP puncta. The watershed function was used to separate adjacent puncta. A list of 2D objects of individual ROIs (minimum size filter of 0.002 μm2) was created to measure the 2D areas of each Ribeyeb/CTBP puncta.”

      We did refer to ribbon volume once in the discussion, but volume is not reflected in our analyses, so we have removed this mention of volume.

      (4) More validation data showing gene/protein removal for the crispants would be helpful.

      Great suggestion. As this is a relatively new method, we have created a figure that outlines how we genotype each individual crispant animal analyzed in our study Figure 6-S1. In the methods we have also added the following information:

      “fPCR fragments were run on a genetic analyzer (Applied Biosystems, 3500XL) using LIZ500 (Applied Biosystems, 4322682) as a dye standard. Analysis of this fPCR revealed an average peak height of 4740 a.u. in wild type, and an average peak height of 126 a.u. in kif1aa F0 crispants (Figure 6-S1). Any kif1aa F0 crispant without robust genomic cutting or a peak height > 500 a.u. was not included in our analyses.”

      Reviewer #3 (Recommendations For The Authors):

      Lines 208-209--should refer to the movie in the text.

      Movie S1 is now referenced here.

      It would be helpful if the authors could analyze and quantify the effect of nocodozole and taxol on microtubules (movie 7).

      See responses above to Reviewer #1’s similar request.

      Figure 7 caption says "500 mM" nocodozole.

      Thank you, we have changed the caption to 500 nM.

      One problem with the MSD analysis is that it is dependent upon fits of individual tracks that lead to inaccuracies in assigning diffusive, restricted, and directed motion. The authors might be able to get around these problems by looking at the ensemble averages of all the tracks and seeing how they change with the various treatments. Even if the effect is on a subset of ribeye spots, it would be reassuring to see significant effects that did not rely upon fitting.

      We are hesitant to average the MSD tracks as not all tracks have the same number of time steps (ribbon moving in and out of the z-stack during the timelapse). This makes it challenging for us to look at the ensembles of all averages accurately, especially for the duration of the timelapse. This is the main reason why added another analysis, displacements > 1µm as another readout of directional motion, a measure that does not rely upon fitting.

      The abstract states that directed movement is toward the synapse. The only real evidence for this is a statement in the results: "Of the tracks that showed directional motion, while the majority move to the cell base, we found that 21.2 % of ribbon tracks moved apically." A clearer demonstration of this would be to do the analysis of Figure 2G for the ribeye aggregates.

      If was not possible to do the same analysis to ribbon tracks that we did for the EB3-GFP analysis in Figure 2. In Figure 2 we did a 2D tracking analysis and measured the relative angles in 2D. In contrast, the ribbon tracking was done in 3D in Imaris not possible to get angles in the same way. Further the MSD analysis was outside of Imaris, making it extremely difficult to link ribbon trajectories to the 3D cellular landscape in Imaris. Instead, we examined the direction of the 3D vectors in Imaris with tracks > 1µm and determined the direction of the motion (apical, basal or undetermined). For clarity, this data is now included as a bar graph in Figure 3L. In our results, we have clarified the results of this analysis:

      “To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells and 203 tracks). Of the tracks with displacement > 1 µm, the majority of ribbon tracks (45.8 %) moved to the cell base, but we also found a subset of ribbon tracks (20.8 %) that moved apically (33.4 % moved in an undetermined direction) (Figure 3L).”

      Some more detail about the F0 crispants should be provided. In particular, what degree of cutting was observed and what was the criteria for robust cutting?

      See our response to Reviewer 2 and the newly created Figure 6-S1.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Specificity of MYL3 Selection:

      My previous question focused on why MYL3 was prioritized over other myosin family members. While the response broadly implicates myosins in viral entry, it does not justify why MYL3 was specifically chosen. For clarity, the "Introduction sections" should explicitly state the unique features of MYL3 (e.g., domain structure, binding affinity, or prior evidence linking it to NNV) that distinguish it from other myosins.

      Thank you for your valuable comment regarding the specificity of MYL3 selection. In response, we have revised the "Introduction" section to explicitly clarify the rationale for prioritizing MYL3 over other myosin family members. Specifically, we have now included prior evidence linking MYL3 to NNV infection, citing our studies that identified MYL3 as a potential host factor interacting with NNV CP protein. In our previous study, sixteen CP-interacting proteins were identified by Co-IP assays followed by MS, including HSP90ab1, Centrosomal protein 170B, MYL3 and so on. In addition to our findings, previous study by other researchers has also reported that Epinephelus coioides MYL3 can bind to NNV (page 3, lines 79–81). These revisions provide a clearer justification for the selection of MYL3 and distinguish it from other myosin proteins. The added content can be found in the revised manuscript on page 3, lines 81–84.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (…) In my view, the part about NF-YA1 is less strong - although I realize this is a compelling candidate to be a regulator of cell cycle progression, the experimental approaches used to address this question falls a bit short, in particular, compared to the very detailed approaches shown in the rest of the manuscript. The authors show that the transcription factor NF-YA1 regulates cell division in tobacco leaves; however, there is no experimental validation in the experimental system (nodules). All conclusions are based on a heterologous cell division system in tobacco leaves. The authors state that NF-YA1 has a nodule-specific role as a regulator of cell differentiation. I am concerned the tobacco system may not allow for adequate testing of this hypothesis.

      Reviewer #1 makes a valid point by asking to focus the manuscript more explicitly on the role of NF-YA1 as a differentiation factor in a symbiotic context. We have now addressed this formally and experimentally.

      The involvement of A-type NF-Y subunits in the transition to the early differentiation of nodule cells has been documented in model legumes through several publications that we refer to in the revised version of the discussion (lines 617/623). We fully agree that the CDEL system, because it is heterologous, does not allow us more than to propose a parallel explanation for these observations - i.e_., that the Medicago NF-YA1 subunit presumably acts in post-replicative cell-cycle regulation at the G2/M transition. Considering your recommendations and those of reviewer #2, we sought to support this conclusion by testing the impact of localized over-expression of _NF-YA1 on cortical cell division and infection competence at an early stage of root colonization. The results of these experiments are now presented in the new Figure 9 and Figure 9-figure supplement 1-5 and described from line 435 to 495.

      With the fluorescent tools the authors have at hand (in particular tools to detect G2/M transition, which the authors suggest is regulated by NF-YA1), it would be interesting to test what happens to cell division if NF-YA1 is over-expressed in Medicago roots?

      To limit pleiotropic effects of an ectopic over-expression, we used the symbiosis-induced, ENOD11 promoter to increase NF-YA1 expression levels more specifically along the trajectory of infected cells. We chose to remain in continuity with the experiments performed in the CDEL system by opting for a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. The results obtained are presented in Figure 9B (quantification of split infected cells), in Figure 9-figure supplement 1B (ENOD11 expression profile), in Figure 9-figure supplement 3B (representative confocal images) and Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal). There, we show that mitosis remains inhibited in cells accommodating infection threads, but is completed in a higher proportion of outer cortical cells positioned on the infection trajectory, where ENOD11 gene transcription is active before their physical colonization.

      Based on NF-YA1 expression data published previously and their results in tobacco epidermal cells, the authors hypothesize that NF-YA regulates the mitotic entry of nodule primordial cells. Given that much of the manuscript deals with earlier stages of the infection, I wonder if NF-YA1 could also have a role in regulating mitotic entry in cells adjacent to the infection thread?

      The expression profile of NF-YA1 at early stages of cortical infection (Laporte et al., 2014) is indeed similar to the one of ENOD11 (as shown in Figure 9-figure supplement 1C) in wild-type Medicago roots, with corresponding transcriptional reporters being both activated in cells adjacent to the infection thread. Under our experimental conditions, additional expression of NF-YA1 (driven by the ENOD11 promoter) in these neighbouring cells did not impact their propensity to enter mitosis and to complete cell division. These results are presented in Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal) and Figure 9-figure supplement 5 (quantification of split neighbouring cells).

      Reviewer #1 (Recommendations For The Authors):

      - In the first part, images show the qualitative presence/absence of H3.1 or H3.3 histones.

      Upon closer inspection, many cells seem to have both histones. In Fig1-S1 for example (root meristem), it is evident that there are many cells with low but clearly present H3.1 content in the green channel; however, in the overlay, the green is lost and H3.3 (pink) is mainly visible. What does this mean in terms of the cell cycle? 

      We fully agree with reviewer #1 on these points. Independent of whether they have low or high proliferation potential, most cells retain histone H3.1 particularly in silent regions of the genome, while H3.3 is constitutively produced and enriched at transcriptionally active regions. When channels are overlaid, cells in an active proliferation or endoreduplication state (in G1, S or G2, depending on the size of their nuclei) will appear mainly "green" (H3.1-eGFP positive). Cells with a low proliferation potential (e.g., in the QC), G2-arrested (e.g., IT-traversed) or terminally differentiating (e.g., containing symbiosomes or arbuscules) will appear mainly "magenta" (H3.1-low, medium to high H3.3-mCherry content).

      Furthermore, all nodule images only display the overlay image, and individual fluorescence channels are not shown. Does the same masking effect happen here? It may be helpful to quantify fluoresce intensity not only in green but also in red channels as done for other experiments.

      Quantifying fluorescence intensity in the mCherry channel may indeed help to highlight the likely replacement of H3.1-eGFP by H3.3-mCherry in infected cells, as described by Otero and colleagues (2016) at the onset of cellular differentiation. However, the quantification method as established (i.e., measuring the corrected total nuclear fluorescence at the equatorial plane) cannot be applied, most of the time, to infected cells' nuclei due to the overlapping presence of mCherry-producing S. meliloti in the same channel (e.g., in Figure 2B). Nevertheless, and to avoid this masking effect when the eGFP and mCherry channels are overlaid, we now present them as isolated channels in revised Figures 1-3 and associated figure supplements. As the cell-wall staining is regularly included and displayed in grayscale, we assigned to both of them the Green Fire Blue lookup table, which maps intensity values to a multiple-colour sequential scheme (with blue or yellow indicating low or high fluorescence levels, respectively). We hope that this will allow a better appreciation of the respective levels of H3.1- and H3.3-fusions in our confocal images.

      - Fig 1 B - it is hard to differentiate between S. meliloti-mCherry and H3.3-mCherry. Is there a way to label the different structures?

      In the revised version of Figure 1B, we used filled or empty arrowheads to point to histone H3-containing nuclei. To label rhizobia-associated structures, we used dashed lines to delineate nodule cells hosting symbiosomes and included the annotation “IT” for infection threads. We also indicated proliferating, endoreduplicating and differentiating tissues and cells using the following annotations: “CD” for cell division, “En” for endoreduplication and “TD” for terminal differentiation. All annotations are explained in the figure legend.

      - Fig 1 - supplement E and F - no statistics are shown.

      We performed non-parametric tests using the latest version of the GraphPad Prism software (version 10.4.1). Stars (Figure 1-figure supplement 1F) or different letters (Figure 1-figure supplement 1G) now indicate statistically significant differences. Results of the normality and non-parametric tests were included in the corresponding Source Data Files (Figure 1 – figure supplement 1 – source data 1 and 2). We have also updated the compact display of letters in other figures as indicated by the new software version. The raw data and the results of the statistical analyses remain unchanged and can be viewed in the corresponding source files.

      - Fig 2 A - overview and close-up image do not seem to be in the same focal plane. This is confusing because the nuclei position is different (so is the infection thread position).

      We fully agree that our former Figure may have confused reviewers #1 and #2 as well as readers. Figure 2A was designed to highlight, from the same nodule primordium, actively dividing cells of the inner cortex (optical section z 6-14) and cells of the outer cortex traversed, penetrated by or neighbouring an infection thread (optical section z 11-19). We initially wanted to show different magnification views of the same confocal image (i.e_._, a full-view of the inner cortex and a zoomed-view of the outer layers) to ensure that audiences can identify these details. In the revised version of Figure 2A, we displayed these full- and zoomed-views in upper and lower panels, respectively and we removed the solid-line inset to avoid confusion. 

      - Fig 1A and Fig 2E could be combined and shown at the beginning of the manuscript. Also, consider making the cell size increase more extreme, as it is important to differentiate G2 cells after H3.1 eviction and cells in G1. You have to look very closely at the graph to see the size differences.

      We have taken each of your suggestions into account. A combined version of our schematic representation with more pronounced nuclei size differences is now presented in Figure 1A.

      - Fig. 3 C is difficult to interpret. Can this be split into different panels?

      We realized that our previous choice of representation may have been confusing. Each value corresponds only to the H3.1-eGFP content, measured in an infected cell and reported to that of the neighbouring cell (IC / NC) within individual root samples. Therefore, we removed the green-magenta colour code and changed the legend accordingly. We hope that these slight modifications will facilitate the interpretation of the results - namely, that the relative level of H3.1 increases significantly in infected cells in the selected mutants compared to the wild-type. This mode of representation also highlights that in the mutants, there are more individual cases where the H3.1 content in an infected cell exceeds that of the neighbouring cell by more than two times. These cases would be masked if the couples of infected cells and associated neighbours would be split into different panels as in Figure 3B.

      - Line 357/359. I assume you mean ...'through the G2 phase can commit to nuclear division'.

      We have edited this sentence according to your suggestion, which now appears in line 370. 

      Reviewer #2 (Recommendations For The Authors):

      Cell cycle control during the nitrogen-fixing symbiosis is an important question but only poorly understood. This manuscript uses largely cell biological methods, which are always of the highest quality - to investigate host cell cycle progression during the early stages of nodule formation, where cortical infection threads penetrate the nodule primordium. The experiments were carefully conducted, the observations were detail oriented, and the results were thought-provoking. The study should be supported by mechanistic insights. 

      (1) One thought provoked by the authors' work is that while the study was carried out at an unprecedented resolution, the relationship between control of the cell cycle and infection thread penetration remains correlative. Is this reduced replicative potential among cells in the infection thread trajectory a consequence of hosting an infection thread, or a prerequisite to do so?

      We understand and share the point of view of reviewer #2. At this stage, we believe that our data won’t enable us to fully answer the question, thus this relationship remains rather correlative. The reasons are that 1) the access to the status of cortical cells below C2 is restricted to fixed material and therefore only represents a snapshot of the situation, and 2) we are currently unable to significantly interfere with mechanisms as intertwined as cell cycle control and infection control. What we can reasonably suggest from our images is that the most favorable window of the cell cycle for cells about to be crossed by an infection thread is post-replicative, i.e., the G2 phase. Typical markers of the G2 phase were recurrently observed at the onset of physical colonization – enlarged nucleus, containing less histone H3.1 than neighbouring cells in S phase (e.g., in Figure 2A). Reaching the G2 phase could therefore be a prerequisite for infection (and associated cellular rearrangements), while prolonged arrest in this same phase is likely a consequence of transcellular passage towards a forming nodule primordium.

      More importantly, in either scenario, what is the functional significance of exiting the cell cycle or endocycle? By stating that "local control of mitotic activity could be especially important for rhizobia to timely cross the middle cortex, where sustained cellular proliferation gives rise to the nodule meristem" (Line 239), the authors seem to believe that cortical cells need to stop the cell cycle to prepare for rhizobia infection. This is certainly reasonable, but the current study provides no proof, yet. To test the functional importance of cell cycle exit, one would interfere with G2/M transition in nodule cells,  and examine the effect on infection.

      We fully agree with reviewer #2 that the functional importance of a cell-cycle arrest on the infection thread trajectory remains to be demonstrated. Interfering with cell-cycle progression in a system as complex and fine-tuned as infected legume roots certainly requires the right timing – at the level of the tissue and of individual cells; the right dose; and the right molecular player(s) (i.e., bona fide activators or repressors of the G2/M transition). Using the symbiosis-specific NPL promoter, activated in the direct vicinity of cortical infection threads (Figure 9-figure supplement 1B), we tried to force infectable cells to recruit the cell division program by ectopically over-expressing the Arabidopsis CYCD3.1, “mimicking” the CDEL system. So far, this strategy has not resulted in a significant increase in the number of uninfected nodules in transgenic hairy roots - though the effect on symbiosome release remains to be investigated. Provided that a suitable promoter-cell cycle regulator combination is identified, we hope to be able to answer this question in the future.

      Given that the authors have already identified a candidate, and showed it represses cell division in the CDEL system, not testing the same gene in a more relevant context seems a lost opportunity. If one ectopically expressed NY-YA1 in hairy roots, thus repressing mitosis in general, would more cells become competent to host infection threads? This seems a straightforward experiment and readily feasible with the constructs that the authors already have. If this view is too naive, the authors should explain why such a functional investigation does not belong in this manuscript.

      Reviewer #2's point is entirely valid, and we decided to address it through additional experiments. To avoid possible side effects on development by affecting cell division in general, we placed NF-YA1 under control of the symbiosis-induced ENOD11 promoter. Based on the results obtained in the CDEL system, the pENOD11::FLAG-NF-YA1 cassette was coupled to a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. Competence for transcellular infection was maintained upon local NFYA1 overexpression, the latter leading to a slight (non-significant) increase in the number of infected cells per cortical layer. These results are presented in Figure 9-figure supplement 3A-B (representative confocal images) and in Figure 9-figure supplement 4A-

      G.

      (1b) A related comment: on Line 183, it was stated that "The H3.1-eGFP fusion protein was also visible in cells penetrated but not fully passed by an infection thread". Presumably, the authors were talking about the cell marked by the arrowhead. But its H3.1-GFP signal looks no different from the cell immediately to its left. It is hard to say which cells are ones "preparing for intracellular infection pass through S-phase", and which ones are just "regularly dividing cortical cells forming the nodule primordium". What can be concluded is that once a cell has been fully transversed by an infection thread, its H3.1 level is low. Whether this is the cause or consequence of infection cannot be resolved simply by timing the appearance or disappearance of H3.1-GFP.

      We basically agree with comment 1b. In an unsynchronized system such as infected hairy roots, it is challenging to detect the event where a cell is penetrated, but not yet completely crossed by an infection thread. What we wanted to emphasize in Figure 2A, is that host cells in the path of an infection thread re-enter the cell cycle and pass through S-phase just as their neighbours do (as pointed out by reviewer #2 in his summary). The larger nucleus with slightly lower H3.1-eGFP signal than the neighbouring cell (as indicated by the use of the Green Fire Blue lookup table) suggests that the infected cell marked by the arrowhead in Figure 2A is actually in the G2 phase. The main difference is indeed that cells allowing complete infection thread passage exit the cell cycle and largely evict H3.1 while their neighbours proceed to cell division (as exemplified by PlaCCI reporters in Figure 4CD and the new Figure 5-figure supplement 2). Whether cell-cycle exit in G2 is a cause, or a consequence of cortical infection is a question that cannot be easily answered from fixed samples, which is a limitation of our study.

      (2) The authors have convincingly demonstrated that cortical cells accommodating infection threads exit the cell cycle, inhibit cell division, and down-regulate KNOLLE expression. How do these observations reconcile with the feature called the pre-infection thread? The authors devoted one paragraph to this question in the Discussion, but this does seem sufficient given that the pre-infection thread is a prominent concept. Is the resemblance to the cell division plane superficial, or does it reflect a co-option of the normal cytokinesis machinery for accommodating rhizobia?

      From our point of view, cortical cells forming pre-infection threads are likely in an intermediate state. PIT structures undoubtedly share many similarities with cells establishing a cell division plane. The recruitment of at least some of the players normally associated with cytokinesis has been demonstrated and is consistent with the maintenance of infectable cells in a pre-mitotic phase in Medicago, as discussed in lines 558 to 568. We nevertheless think that the arrest of the cell cycle in the G2 phase, presumably occurring in crossed cortical cells, constitutes an event of cellular differentiation and specialization in transcellular infection. 

      The following are mainly points of presentation and description: 

      (3) Line 158: I can't see "subnuclear foci" in Figure 1-figure supplement 1C-E. However, they are visible in Fig. 1C.

      We hope that presenting the eGFP and mCherry channels in separate panels and assigning them the Green Fire Blue colour scheme provides better visibility and contrast of these detailed structures. We now refer to Figure 1C in addition to Figure 1–figure supplement 1E in the main text (line 161). 

      (4) Line 160: The authors should outline a larger region containing multiple QC cells, rather than pointing to a single cell, as there are other areas in the image containing cells with the same pattern.

      We updated Figure 1-figure supplement 1E accordingly.

      (5) Fig. 1B should include single channels, since within a single plant cell, the nucleus, the infection thread, and sometimes symbiosomes all have the same color. This makes it hard to see whether the nuclei in these cells are less green, or are simply overwhelmed by the magenta color.

      To improve the readability of Figure 1B and to address suggestions from individual reviewers, we now include separate channels and have annotated the different structures labeled by mCherry.

      (6) Fig. 2A: the close-up does not match the boxed area in the left panel. Based on the labeling, it seems that the two panels are different optical sections. But why choose a different optical depth for the left panel? This can be disorienting to the author, because one expects the close-up to be the same image, just under higher magnification.

      We fully agree that our previous choice of representation may have been confusing. As we also specified to reviewer #1, we wanted to show a full-view of proliferating cells in the inner cortex and a zoomed-view of infected cells in the outer layers of the same nodule primordium. In the revised version of Figure 2A, we displayed these full- and zoomedviews in separate panels and removed the boxed area to avoid confusion. 

      (7) Figure 2-figure supplement 1B: the cell indicated by the empty arrowhead has a striking pattern of H3.1 and H3.3 distribution on condensed chromosomes. Can you comment on that?

      Reviewer #2 may be referring to the apparent enrichment of H3.3 at telomeres, previously described in Arabidopsis, while pericentromeric regions are enriched in H3.1. This distribution is indeed visible on most of the condensed chromosomes shown in Figure 2-figure supplement 1B. We included this comment in the corresponding caption.

      (8) Fig. 4: It is not very easy to distinguish M phase. Can the authors describe how each phase is supposed to look like with the reporters?

      We agree with reviewer #2 and attempted to improve Figure 4, which is now dedicated to the Arabidopsis PlaCCI reporter. ECFP, mCherry, and YFP channels were presented separately and the corresponding cell-cycle phases (in interphase and mitosis) were annotated. The Green Fire Blue lookup table was assigned to each reporter to provide the best visibility of, for example, chromosomes in early prophase. We included a schematic representation corresponding to the distribution of each reporter, using the colors of the overlaid image to facilitate its interpretation.

      (9) Line 298: what is endopolyploid? This term is used at least three times throughout the manuscript. How is it different from polyploid?

      In the manuscript, we aimed to differentiate the (poly)ploidy of an organism (reflecting the number of copies of the basic genome and inherited through the germline) from endopolyploidy produced by individual somatic cells. As reviewed by Scholes and Paige, polyploidy and endopolyploidy differ in important ways, including allelic diversity and chromosome structural differences. In the Medicago truncatula root cortex for example, a tetraploid cell generated via endoreduplication from the diploid state would contain at most two alleles at any locus. The effects of endopolyploidy on cell size, gene expression, cell metabolism and the duration of the mitotic cell cycle are not shared among individual cells or organs, contrasting to a polyploid individual (Scholes and Paige, 2015).

      See Scholes, D. R., & Paige, K. N. (2015). Plasticity in ploidy : A generalized response to stress. Trends in Plant Science, 20(3), 165‑175. https://doi.org/10.1016/j.tplants.2014.11.007

      (10) Line 332: "chromosomes on mitotic figures" - what does this mean?

      Reviewer #2 is right to point out this redundant wording. Mitotic “figures” are recognized, by definition, based on chromosome condensation. We now use the term "mitotic chromosomes" (line 344).

      (11) Fig. 6A: could the authors consider labeling the doublets, at least some of them? I understand that this nucleus contains many doublets. However, this is the first image where one is supposed to recognize these doublets, and pointing out these features can facilitate understanding. Otherwise, a reader might think the image is comparable to nuclei with no doublets in the rest of the figure.

      Following this suggestion, five of these doublets are now labeled in Figure 7A (formerly Figure 6A).

  3. May 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) A previously determined 2:2 heterodimeric complex of LGI1-ADAM22 was suggested to play a role in trans interactions. Could the authors discuss if the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex, or if both are possible?

      We noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1-ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3-ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) It is not entirely clear to me if the LGI1-ADAM22 complex is also crosslinked in the HS-AFM experiments. Could this be more clearly indicated? In addition, if this is the case, could an explanation be given about how the complex can still dissociate?

      Thank you for the constructive suggestions. A non-crosslinked 3:3 LGI-ADAM22 complex was used for HS-AFM observations. To clarify the sample used for HS-AFM, we have modified the text as follows.

      P.8 “Dynamics of the LGI1‒ADAM22 higher-order complex observed by HS-AFM

      HS-AFM images of gel filtration chromatography fractions containing the 3:3 LGI1-ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) predominantly…”

      P.10 Materials and methods

      “HS-AFM observations of the LGI1–ADAM22<sub>ECD</sub> complex (not chemically crosslinked with glutaraldehyde) were conducted on AP-mica,…”

      (3) The LGI1 and ADAM22 are of similar size. To me, this complicates the interpretation of dissociation of the complex in the HS-AFM data. How is the overinterpretation of this data prevented? In other words, what confidence do the authors have in the dissociation steps in the HS-AFM data?

      Our criteria for assigning HS-AFM images to the 3:3 LGI1–ADAM22<sub>ECD</sub> complex were based on a comparison of the simulated AFM image of the 3:3 complex obtained by cryo-EM. The automatized fitting process (42) identifies the optimal orientation of cryo-EM images that closely matches the HS-AFM image. In the present study, the concordance coefficient (CC) reached 0.8, indicating that the protein orientation in HS-AFM images of the 3:3 complex was objectively satisfactory.

      Regarding the dissociation step of ADAM22 from the 3:3 complex, we carefully analyzed the HS-AFM videos frame by frame and observed that the protrusion corresponding to ADAM22 in the 3:3 complex disappeared at a specific frame (4.5 s in the third molecule in Movie S1). The dissociation steps of ADAM22 were further confirmed by integrating multiple independent HS-AFM experiments and observations. Thus, although HS-AFM images alone cannot determine the orientation of LGI1 and ADAM22 in the 3:3 complex, the comparison of cryo-EM images with simulated AFM images enables objective assignment and orientation of proteins in the 3:3 complex through automated fitting.

      *Ref. 42: R. Amyot et al., Flechsig, Simulation atomic force microscopy for atomic reconstruction of biomolecular structures from resolution-limited experimental images. PLoS Comput Biol 18, e1009970 (2022).

      (4) What is the "LGI1 collapse" mentioned in Figure 4c?

      Thank you for the constructive suggestions. The term “LGI1 collapse” was intended the dissociation of LGI1 from the 3:3 complex. To avoid confusion, we have revised it to “LGI1 release”.

      (5) Am I correct that the structure indicates that the trimerization is entirely organized by LGI1? This would suggest LGI1 trimerizes on its own. Can this be discussed? Has this been observed?

      Yes. The present cryo-EM structure of the 3:3 complex indicates that the trimerization can be entirely organized by LGI1. In addition, during the HS-AFM imaging, the triangle shape seems to be maintained even if one ADAM22<sub>ECD</sub> molecule is released. These findings suggest the possibility that LGI1 could trimerize on its own although this possibility could not be tested due to the difficulty in the expression of the full-length LGI1 alone for biophysical analysis in our hands. On the other hand, considering the dynamic property of the 3:3 complex and spatial alignment of LGI1LRR and ADAM22, we cannot exclude the possibility that ADAM22 could act as a platform to facilitate the intermolecular interaction between LGI1<sub>LRR</sub> and LGI1*<sub>EPTP</sub> for the trimerization of LGI1. This discussion was added in the first paragraph of the subsection "Dynamics of the LGI1–ADAM22 higher-order complex by HS-AFM".

      (6) C3 symmetry was not applied in the cryo-EM reconstruction of the heterohexameric 3:3 LGI1-ADAM22 complex. How much is the complex deviating from C3 symmetry? What interactions stabilize the specific trimeric conformation reconstructed here, compared to other trimeric conformations?

      According to this comment, we compared the non-symmetric, present cryo-EM structure to the previously calculated _C_3 symmetry-restrained structure based on small-angle X-ray scattering analysis and the _C_3 symmetric structure generated by AlphaFold3. Their differences in the domain or protomer configuration are illustrated in Fig. S9.

      We did not find interactions that could obviously stabilize the specific trimeric conformation but the closure motion of LGI1<sub>LRR</sub> (relative to LGI1<sub>EPTP</sub>) in chain F appears to locate it in close proximity to LGI1LRR in chain D to make the triangular assembly slightly more compact. This (partly) compact configuration might stabilize the non-symmetric trimeric configuration observed in the cryo-EM structure. This was described in the last sentence in the subsection "Cryo-EM structure of the 3:3 LGI1– ADAM22<sub>ECD</sub> complex".

      Reviewer #2 (public review):

      The functional significance of these two complexes in the context of synapse remains speculative.

      To assess the functional significance of the 3:3 complex, we spent time and effort designing mutations that solely inhibit the 3:3 assembly but failed to find such mutations. In this paper, we just focused on structural characterization of the 3:3 complex.

      Additionally, the structural presentations in Figures 1-3 (especially Figures 2-3) lack the clarity needed for general readers to fully understand the authors' key points. Enhancing the quality of these visual representations would greatly improve accessibility and comprehension.

      We made an effort to improve Figures 1-3 accordingly. Specifically, we revised them based on the strategy suggested in the Editorial comment regarding this reviewer's comment.

      Editorial comments:

      We noticed that in the reconstruction of the 3:3 complex, which is claimed to be at 3.8A resolution, beta-strands are not separated in the map and local resolution estimates vary from 6-10A. Please clarify.

      We revised Fig. S8 to show the local resolution and volume quality, which correspond to nominal resolution of 3.8 Å, estimated from gold-standard FSC.

      Reviewer #1 (Recommendations for the authors):

      (1) PDB validation reports should be presented to allow further validation

      The PDB validation reports were attached to the revised manuscript (uploaded as "related manuscript file").

      (2) In Figure 4, models below the AFM figures are difficult to see because of the light coloring. In addition, in panel c, the orientation of some of the parts of the models below the 19.2 and 34.5 s. panels do not seem to correlate with the AFM figures. Could the models be adjusted so that they represent the data better?

      Thank you for the constructive suggestions. According to the Reviewer’s comments, we have revised the AFM figures (Fig. 4).

      (3) References are sometimes missing for important statements. Please check throughout.

      Some examples:

      P3, "it has been suggested that the 3:3 complex regulates the density of synaptic molecules such as scaffolding proteins and synaptic vesicles".

      P3. "Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23".

      According to this comment, we rewrote the description about potential physiological roles of the 3:3 complex and added references as follows:

      "Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion (9, 17, 19). In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment (18, 20). However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined."

      We also added references to the following sentences:

      p.2, (the last sentence in the first paragraph of the Introduction) “Additionally, some epilepsy-related mutations have been identified in genes encoding non-ion channel proteins such as LGI1 (4-7).”

      p.3, ln 4-5, “The metalloprotease-like domain interacts with the EPTP domain of LGI1 in the extracellular space (11, 14).”

      p.3, ln 9-10, “Furthermore, LGI1 forms a complex with the voltage-gated potassium channel (VGKC) through ADAM22/23 (9, 17, 18)”

      p.3, ln 20-22, “The results revealed the structural basis of the interaction between the EPTP domain of one LGI1 and the LRR domain of the other LGI1, as well as the interaction between the EPTP domain of LGI1 and the metalloproteinase-like domain of ADAM22 (14)”

      (4) S5 for clarity please add an overview of the complex highlighting where the different parts shown in the panels are located.

      Fig. S5 was modified accordingly. Every panel showing a zoom-up view was indicated by a box in an overview of the complex.

      (5) S7 a+b, also here add models for the structures to indicate which parts are shown.

      Could labels be added to highlight important parts?

      We added an overview of the complex with boxes that indicate the parts shown as the panels, according to this comment. We also added labels to highlight residues that are important for the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interaction in the panel showing the LGI1<sub>EPTP</sub>–ADAM22<sub>ECD</sub> interface.

      (6) S7c also shows the cartoon of the structure. How is it possible that the local resolution is not much higher than 6 Å? The overall resolution was 3.8 Å? This looks like a figure of the density plotted at a low level, and not as stated a "surface representation". Could an extra panel be shown of the density plotted at a higher level? Also, please add Å to the legend in this figure.

      Local resolution maps of the 3:3 LGI1-ADAM22<sub>ECD</sub> complex were shown as Fig. S8 in the revised manuscript. According to this comment, the distribution of the resolution was plotted onto the density at high (0.06) and low (0.03) levels. "Å" was added to the legend in the figure.

      Reviewer #2 (Recommendations for the authors):

      (1) The study was conducted using the ectodomain (ECD) of ADAM22. It remains unclear whether the 3:3 complex could form if the transmembrane domain (TMD) of ADAM22 were included. In other words, it is difficult to assess whether the observed 3:3 complex represents plausible cis interactions.

      As mentioned in our reply to the first comment from Reviewer #1, we noticed that there was no obvious structural feature strongly suggesting that the heterohexameric 3:3 LGI1–ADAM22 is more likely to represent a cis complex or a trans complex. Both are possible at the synapse (and similarly, for LGI3–ADAM23 at the jaxtaparanode of myelinated axons). Therefore, we revised the Introduction and Discussion sections as follows:

      Introduction: (about potential structural mechanisms of the 3:3 complex)

      “Similarly to the 2:2 complex, the 3:3 complex might serve as an extracellular scaffold to stabilize Kv1 channels or AMPA receptors in a trans-synaptic fashion. In addition, the 3:3 assembly in a cis fashion on the same membrane might regulate the accumulation of Kv1 channel complexes at axon initial segment. However, no clear evidence to prove these potential mechanistic roles of the 3:3 assembly has been provided, and the three-dimensional structure of the 3:3 complex has not yet been determined.”

      Discussion: (about a role of the LGI3–ADAM23 complex at the jaxtaparanode of myelinated axons)

      “In this context, as discussed in (30), either or both of the 2:2 and 3:3 complexes might be formed in a trans fashion at the juxtaparanode of myelinated axons and bridge the axon and the innermost myelin membrane. Alternatively, the 3:3 complex formed in a cis fashion might positively regulate the clustering of the axonal Kv channels at the juxtaparanode, possibly in a similar manner at the axon initial segment.”

      *Ref. 30: Y. Miyazaki et al., Oligodendrocyte-derived LGI3 and its receptor ADAM23 organize juxtaparanodal Kv1 channel clustering for short-term synaptic plasticity. Cell Rep 43, 113634 (2024).

      (2) Page 2, line 1: "...caused by genetic mutations." - Specify the mutations involved. Which genes are mutated? Providing this information would enhance clarity and context.

      According to this comment, we rephrased the sentence as follows:

      "LGI1 is linked to epilepsy, a neurological disorder that can be caused by genetic mutations of genes regulating neuronal excitability (e.g., voltage- or ligand-gated ion channels)."

      (3) The experimental strategy and data for both cryo-EM and HS-AFM are of high quality. However, improvements are needed in the cryo-EM/structural figures to enhance clarity. Structural components should be labeled, and the protein interfaces should be identified within the overall complex figures in Figures 2 and 3, as the current presentation is challenging for general readers to follow. For example, in Figure 2, panel a would benefit from clear labeling to indicate the locations of ADAM22 and LGI1. Panels b and c lack context unless the authors specify which interface corresponds to panel a. Additionally, panels e and f are unlabelled, making it difficult to interpret the figures. Improved annotations and descriptions would significantly enhance figure accessibility and comprehension.

      Thank you for the constructive suggestion for enhancing accessibility and comprehension of cryo-EM/structural figures. According to this comment, we labeled structural components and indicated the protein interfaces as boxes in the overall complex figures in Figures 2 and 3. Further, in Figure 2, the locations that panels b and c show were indicated as two boxes in the close-up view in panel a.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The data are generated using ATP read-out (CTG assay). For any inhibitor of mitochondrial function, ATP assays are highly sensitive reflecting metabolic stress, yet these do not necessarily translate into cell growth inhibition using standard Trypan blue assays and tend to overestimate the effects. Please show orthogonal more robust assays of cell growth or proliferation.

      We acknowledge the sensitivity of the ATP read-out assay in reflecting metabolic stress. While additional cell growth assays such as Trypan blue exclusion could provide further insights, we believe that the current ATP assay data robustly demonstrate the effect of the IMT and venetoclax combination on cellular metabolism, which is a critical aspect of our study. The scope of our current work focused on metabolic inhibition, and we suggest that future studies could further explore cell proliferation assays to complement these findings.

      (2) It is concluded that AML cells do not utilize glucose for ATP production. Please provide formal measurements of glycolysis/lactate upon combinatorial treatment.

      We appreciate the reviewer’s suggestion to include glycolysis and lactate measurements, which could indeed add further granularity to our metabolic analysis. However, the primary focus of our study is on mitochondrial function and oxidative phosphorylation (OXPHOS) in AML cells treated with IMT and venetoclax. We believe the data presented in Figure 3 provide strong support for the conclusion that glycolysis is not a major energy source in these cells.

      Specifically, in Figure 3C, we demonstrate that AML cells maintain ATP levels and viability when cultured in galactose, a condition that restricts ATP production through glycolysis and forces cells to rely on OXPHOS. This result strongly suggests that these AML cells are not dependent on glycolysis for ATP production. Furthermore, in Supplementary Figure S3B, we show that oxygen consumption rate (OCR) measurements remain stable in the presence of excess glucose, further supporting our conclusion that the cells do not switch to glycolysis when OXPHOS is inhibited.

      These findings collectively indicate a primary reliance on OXPHOS for energy generation in AML cells, consistent with our study’s objectives to explore mitochondrial dependency and the therapeutic potential of targeting mitochondrial transcription in AML. Future studies could certainly expand on these insights by incorporating a more detailed analysis of glycolytic flux and lactate production under combinatorial treatment, but we believe the current data are sufficient to support our main conclusions.

      (3) The transcriptome data are shown without any analysis of pathways. The conclusion from this data beyond the higher number of genes impacted in the combination arm is unclear. Please provide analysis for example GO pathways and interpret in the context of the drugs' mechanism of action.

      In response to the reviewer’s question, we have added gene ontology (GO) pathway analysis to clarify the transcriptomic impact of our combination treatment with IMT and venetoclax. Functional annotation identified significant enrichment in pathways relevant to innate immune response, mitochondrial function, and cellular signaling processes. Specifically, pathways associated with immune defense, mitochondrial signaling, and intracellular signaling were notably affected. These findings suggest that the combination treatment not only disrupts cellular energy metabolism but also potentially primes immune signaling mechanisms. This aligns with the proposed mechanism, where IMT targets mitochondrial transcription and venetoclax induces apoptosis, together enhancing sensitivity in AML cells. The enriched pathways, therefore, support the mechanism of action of both drugs, showing how the combined inhibition of BCL-2 and mitochondrial transcription creates a compounded cellular disruption that enhances the therapeutic effect.

      (4) Please demonstrate (could be in supplement) matrix of combination to support the statement that the combination is synergistic using Bliss index. The actual Bliss values are missing.

      For the revision, we have now included a matrix of combination treatment effects with the corresponding Bliss synergy index values to substantiate our claim of synergy between IMT and venetoclax. This analysis, provided in the supplement, demonstrates that the observed effects exceed the expected additive impact of each drug alone, as calculated by the Bliss independence model. Specifically, the Bliss values confirm a synergistic interaction in venetoclax-sensitive AML cell lines, highlighting that the combined treatment significantly enhances inhibition of cell viability and apoptosis induction compared to single treatments. This data supports our interpretation of synergy and strengthens the mechanistic conclusions drawn from our findings on the combination therapy’s efficacy.

      (5) Please show KG1 data (OCR), here or in Supplement.

      In response to the reviewer’s request to include OCR data for the KG-1 cell line, we would like to clarify that OCR measurements were attempted; however, they did not yield conclusive results. This is noted in the revised manuscript (Results section), where we explain that the KG-1 cell line did not provide usable OCR data, likely due to limitations in detecting reliable mitochondrial respiration in this particular line under our experimental conditions. Therefore, we were unable to include KG-1 OCR data in the main figures or the supplement.

      Reviewer #2:

      (1) It's important that the authors show that the drug's effects in AML are due to on-target inhibition. It's critical that they show that IMT actually inhibits the mito polymerase in the AML cells in the dose range employed.

      We appreciate the importance of demonstrating on-target inhibition of mitochondrial RNA polymerase by IMT1, especially in light of the detailed characterization of IMT1b, a closely related compound, as presented in Bonekamp et al., Nature 2020. The work by Bonekamp et al. established the specificity and efficacy of IMT1b in targeting mitochondrial RNA polymerase across various tumor models. Building on these findings, we designed our study to primarily evaluate the combinatorial efficacy of IMT1 with venetoclax in AML models, assuming a similar mechanism of action as described for IMT1b. While direct confirmation of on-target inhibition in AML cells by IMT1 would undoubtedly provide additional mechanistic insight, we focused on translational aspects in this study. We believe that the foundational work provided by Bonekamp et al. supports the assumption of on-target effects by IMT1, and we suggest that future studies could explicitly verify this in the context of AML.

      (2) For Fig 1, the stated synergism between Venetoclax (Vex) and IMT in p53 mutant THP1 cells is really not evident, despite what the statistical analysis says. In some ways, the more interesting conclusion is that inhibiting mitochondrial transcription does NOT potentiate the efficacy of Bcl2 inhibition in TP53 mutant AML.

      We appreciate the reviewer’s observation regarding the lack of evident synergy between IMT and venetoclax in TP53 mutant THP-1 cells. In line with this comment, we have now expanded the discussion to emphasize that, while statistical analysis suggested a potential interaction, the biological response in TP53 mutant cells was minimal. This contrasts with the strong synergy observed in TP53 wild-type cell lines, such as MV4-11 and MOLM-13. We have now highlighted that TP53 mutation status may limit the effectiveness of mitochondrial transcription inhibition in potentiating BCL-2 inhibition. This addition underscores the importance of mutation profiles, such as TP53 status, in predicting response to combination therapies in AML and is now clearly addressed in the revised discussion.

      (3) They combine IMT with Vex, but Vex plus azacytidine or decitabine is the approved therapy for AML. Any clinical trial would likely start with this backbone (like Vex+Aza). They should test combinations of IMT with Vex/Aza or Vex/Dec.

      While we recognize the importance of testing IMT in combination with clinically approved therapies like Vex+Aza, our current study was designed to explore the potential of IMT in combination with venetoclax alone. Expanding to other combinations would be an excellent direction for future research but is beyond the scope of our current investigation.

      (4) It's interesting that AML cell lines do not show any reliance on ATP generation from glycolysis, but would this still be the case when OxPhos is inhibited with IMT? Such a simple experiment would be much more interesting and could help them better understand the mechanism of IMT efficacy.

      We thank the reviewer for highlighting this point regarding the reliance of AML cell lines on glycolysis under OxPhos inhibition. In our study, we observed that AML cells predominantly rely on OxPhos, and we did test for ATP production in conditions that favored glycolysis by growing AML cells with galactose instead of glucose in the medium. As described in the manuscript, we did not observe significant ATP production or cell viability from glycolysis, even under these conditions. This finding suggests that AML cells have a low capacity to adapt to glycolytic ATP generation when OxPhos is disrupted by IMT, reinforcing the view that they are highly dependent on mitochondrial function for energy production. We agree that this adaptation—or lack thereof—is an intriguing aspect of IMT efficacy in targeting energy metabolism in AML cells, and we have clarified this point in the discussion.

      (5) OxPhos measurements need statistical analyses.

      We appreciate the reviewer’s suggestion to include statistical analyses for the OXPHOS measurements. We would like to clarify that statistical analyses were included in the initial submission. These are detailed in Figure 3 and its legend, as well as in the Statistical Analysis section, where we specify methods such as the calculation of standard error across replicates. This approach was implemented to ensure the rigor of our OCR data and its conclusions on OXPHOS inhibition in AML cells.

      (6) Given that the combo-treated mice do not exhibit much leukemia in the blood through ~180 days, and yet start dying after 100 days, the authors should comment on this, given that the bone marrow has been shown to be a refuge that protects leukemia cells from various therapies.

      We thank the reviewer for highlighting the observed discrepancy between peripheral blood leukemia levels and survival in combo-treated mice. While leukemic cells were minimally detected in the blood up to approximately 180 days, treated mice began to show signs of disease progression and reduced survival around 100 days. This may suggest that residual leukemic cells persist within the bone marrow, which has been established as a sanctuary site for leukemic cells, providing protection from various therapies. The bone marrow environment likely supports a survival niche, enabling these residual cells to evade treatment effects and potentially initiate disease relapse. We have added this interpretation to the discussion to acknowledge the possibility of bone marrow as a protective refuge, which may limit the full eradication of leukemia in these models despite apparent peripheral blood clearance.

      (7) For Fig 5C, the authors should statistically compare the Combo with Vex alone.

      We have now included statistical comparisons between the combination treatment and venetoclax alone in Fig 5C to provide a clearer interpretation of the data.

      (8) The analyses of gene expression using RNAseq of harvested leukemia cells from the PDX model (Table S2), some more discussion of these results would be helpful, particularly given that neither drug is directly targeting nuclear gene expression.

      We thank the reviewer for their suggestion to discuss the RNAseq findings in more detail. In the revised manuscript, we have expanded on the functional annotation of the gene expression changes observed in leukemia cells from the PDX model following combination treatment (Table S2). The enriched pathways include innate immune involvement, mitochondrial function and immune signaling, and intracellular signaling. This suggests that while neither IMT nor venetoclax directly targets nuclear gene expression, the combined treatment induces secondary effects that alter these pathways, potentially contributing to the treatment’s efficacy in AML. This expanded discussion provides greater insight into how the drug combination impacts gene expression and cellular pathways.

      (9) We need more information on the PDX models, in terms of the classification (M1 to M6) of the patient AMLs and genetics (specific mutations, not just the genes mutated, and chromosomal alterations).

      Additional details regarding the classification and genetic background of the PDX models have been included in the manuscript to better contextualize our findings.

      (10) The authors should discuss whether or not IMT represents an improvement over other therapies intended to target Oxphos in AML (clearly, the low toxicity of IMT is a plus, at least in mice).

      We appreciate the reviewer’s suggestion to discuss IMT in comparison with other OXPHOS-targeting therapies for AML. In the revised discussion, we highlight IMT’s unique properties, particularly its low toxicity profile, which may offer advantages over other OXPHOS inhibitors. This low toxicity, demonstrated in preclinical studies, suggests that IMT might improve patient tolerability compared to existing therapies that target mitochondrial function.

      (11) The authors examined toxicity by weighing the mice and performing CBCs. Measurements of liver and kidney toxicity will be necessary for further clinical development.

      We thank the reviewer for the suggestion to further investigate liver and kidney toxicity. In our study, we assessed toxicity through regular weight monitoring and complete blood counts (CBCs) to evaluate overall health status. While additional liver and kidney toxicity measurements will indeed be important in future studies, resource limitations currently prevent us from performing these additional analyses in this model. We agree that these assessments will be essential as we progress towards clinical development, and we plan to address them in upcoming preclinical studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The Reviewer asks that we provide the source of PDGF-AB/BB proteins.

      We apologize for omitting such information. We now provide the source of PDGF-AB/BB in the Methods as PeproTech. In our revised manuscript we clearly state in Page 7, line 142: “Cells were then treated with recombinant human PDGF-AB (40ng/ml; PeproTech, 10770584) or -BB (20ng/ml; PeproTech, 10771918) for 5 days. “

      The Reviewer asks that we adequately report our chosen irradiation parameters suggesting that we consider (PMCID: PMC5495460) for appropriate parameter reporting.

      We thank the Reviewer for this excellent suggestion. We now provide a more detailed irradiation reporting based on the shared manuscript in Page 9, line 10, line 204.

      The Reviewer requests more details about the age range to distinguish young from old donors.

      In the Methods section of our revised manuscript, we now provide the age range for our old donors being between 53 and 67 while our younger donor population ranged between 19 and 27 years of age. These changes are reflected in Page 6, line 128: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with discogenic pain, with each donor providing written informed consent. Healthy NP and AF cells (23.0 ±3.7 years old) were gifted by Professor Lisbet Haglund from McGill University (Tissue Biobank #2019-4896).”

      The Reviewer wonders about the rationale for using different concentrations of PDGF-AB/BB in the degenerate cell and irradiation experiments.

      We apologize for our lack of clarity. We initially treated cells with different concentrations (20 and 40 ng/ml) of PDGF-AB/BB to first establish a dose-response. From our MTT and gene expression analyses we determined that 20ng/ml was sufficient to elicit significant changes in cell proliferation markers, including MKI67, CCNB1 and CCND1. Increasing the concentration to 40 ng/ml of either growth factor did not significantly influence these parameters. However, we felt that for our bulk RNA seq experiments, we may see better changes in signaling molecules under 40ng/ml of PDGF-AB since its effects on cell growth at this concentration were maximal while PDGF-BB was maintained at 20ng/ml based on its efficacy in our mitogenic response.

      The Reviewer asks that we consider describing the effects of PDGF-AB/BB as mitigating or therapeutic rather than protective both in the title and throughout the manuscript.

      We agree with the Reviewer’s recommendation, and we have now changed the title to “Therapeutic effects of PDGF-AB/BB against cellular senescence in human intervertebral disc”. Moreover, we implemented this change in the revised manuscript as requested.

      The Reviewer believes that changes in the NP are more clinically evident (by imaging methods), despite degeneration often initiating from the AF (annulus fibrosus), e,g. through tears/microtears and would like for us to reflect this in our revised manuscript.

      We agree with the Reviewer’s comment, and we thank them for this added accuracy. On this basis, we now corrected our language in the introduction by stating in Page 4, line 68 that: “To date, the main focus of IVD cell studies has been on the NP, as changes in the NP are easily detected through imaging techniques like MRI, making it the most visible indicator of disc degeneration in clinical practice. In addition, NP plays a crucial role in the progression of IVD degeneration due to its susceptibility to significant structural and functional changes during aging and degeneration.”

      The Reviewer points out a prior study which examined the effects of X-ray irradiation on NF-kB signaling in young and aged IVDs (PMCID: PMC5495460) suggesting that we include this reference in our revised manuscript.

      We thank the Reviewer for this suggestion, and we are now referencing this elegant study in the discussion section of our revised manuscript. Thus, in page 20, line 440 we state: “ In fact, it has been shown that NF-kB signaling was elevated in mouse IVDs exposed to a single 20 Gy dose of irradiation in an ex vivo culture model.”

      The Reviewer asks that our experimental methods are described in the order of the experimental workflow. For example, section 2.2 describes RNA sequencing, which is a terminal assay. Section 2.2 may be more appropriate for detailing the methods of PDGF-AB/BB treatment, along with the rationale.

      We thank the Reviewer for pointing this out and have reorganized the Methods section accordingly.

      Reviewer #2:

      The Reviewer requests more experimental details in the methodology including the rationale for such methods/conditions as well as specific culture models utilized, substrates, cell density, and media components.

      We apologize for our lack of clarity. We now revised the methods section based on the comments.

      The Reviewer asks about the quantitative data for b-galactosidase assay and immunofluorescence of senescence-associated proteins such as P21 and P16.

      We apologize for omitting this information. We now included the quantification of P21 and P16 positive cells, which is presented in the revised Figures 4. For b-galactosidase assay, we were unable to quantify the percentage of positive cells because we did not perform nuclei staining, making it difficult to accurately determine the total cell number. Instead, we provided representative images showing the full field of view at 10X magnification using Echo microscope.

      The Reviewer requests the protein level data of PDGFRA to determine if the transcripts are being translated to protein.

      We thank the Reviewer for this suggestion. The protein expression of PDGFRA has been included in the Supplementary Figure 2. We found that PDGFRA protein levels were decreased in both NP and AF cells in response to PDGF treatments. It is known that upon binding with PDGF ligands, PDGFRA undergoes rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (doi: 10.1042/BST20200004). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment. Thus, we implemented it in the discussion section in page 22, line486:” Interestingly, while mRNA level was increased in PDGF treated NP cells, its protein level was decreased, highlighting the complexity in PDGF receptor dynamics. Upon binding with PDGF ligands, PDGFRA is known to undergo rapid internalization and degradation, a mechanism that prevents overstimulation of the signaling pathway (Rogers and Fantauzzo 2020). The upregulated gene expression probably attempting to compensate for this degradation and supports continued activation of PDGFRA signaling activation, emphasizing its crucial role in response to the PDGF treatment.”

      The Reviewer points out that our conclusion that “PDGF do not mediate their effects via the PDGFRA” is not supported by the current data asking that further discussion, interpretation, and direct comparison of the nucleus pulposus and annulus fibrosus data sets be presented to the readers.

      We thank the Reviewer for the insightful comment. In page 20, line 432, we have corrected our language to now state: “In contrast, while PDGF treatment alleviated the senescent phenotype in AF cells, it also induced changes in pathways such as response to mechanical stimuli and neurogenesis, which were distinct from those in NP cells. This indicates that the treatment enhanced IVD functionality through different mechanisms within the two compartments.”

      The Reviewer cannot appreciate the changes in S-phase between control and treated groups.

      We apologize for the poor quality of the figure in our initial submission. We analyzed the data in S phase and included them in our revised Figures 5C and 5F.

      The Reviewer believes that discectomies are typically not performed on patients with discogenic back pain but on patients who are undergoing surgery for a herniated disc.

      We agree with the Reviewer, and we corrected our language in the revised manuscript. In Page 6, line 128, we now stated: “Human degenerated NP and AF tissues (Grade IV or V on Pfirrman grade; 64.6 ±8.5 years old)) were obtained as the surgical waste from donors with disc herniation, with each donor providing written informed consent.”

      The Reviewer asks about the protein-protein interactions in AF cells.

      We thank the Reviewer for this suggestion, and we now included it in Figure 3.

      The Reviewer requests more details about the protocol and doses for the irradiation studies.

      In the revised manuscript, we added this information in page 10, line 204.

      The Reviewer asks whether the gene expression of PDGFRA was increased or decreased in irradiated cells compared to non-irradiated cells.

      The gene expression of PDGFRA was decreased in NP cells exposed to irradiation compared to non-irradiated cells. The data are shown in Figure 4 and their description in the text is in page 17, line 411.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhao and colleagues employ Drosophila nephrocytes as a model to investigate the effects of a high-fat diet on these podocyte-like cells. Through a highly focused analysis, they initially confirm previous research in their hands demonstrating impaired nephrocyte function and move on to observe the mislocalization of a slit diaphragmassociated protein (pyd). Employing a reporter construct, they identify the activation of the JAK/STAT signaling pathway in nephrocytes. Subsequently, the authors demonstrate the involvement of this pathway in nephrocyte function from multiple angles, using a gain-of-function construct, silencing of an inhibitor, and ectopic overexpression of a ligand. Silencing the effector Stat92E via RNAi or inhibiting JAK/ STAT with Methotrexate effectively restored impaired nephrocyte function induced by a high-fat diet, while showing no impact under normal dietary conditions.

      Strengths:

      The findings establish a link between JAK/STAT activity and the impact of a high-fat diet on nephrocytes. This nicely underscores the importance of organ crosstalk for nephrocytes and supports a potential role for JAK/STAT in diabetic nephropathy, as previously suggested by other models.

      Weaknesses:

      The analysis is overly reliant on tracer endocytosis and single lines. Immunofluorescence of slit diaphragm proteins would provide a more specific assessment of the phenotypes.

      We thank the reviewer for the positive comments and pointing out that slit diaphragm markers would provide a more specific assessment of the phenotypes. In our revised manuscript, we used Sns-mRuby3, in which mRuby3 was tagged endogenously at the C-terminal of Sns (PMID: 39195240 and PMID: 39431457), to show the slit diaphragm pattern.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Zhao et al. describe a link between JAK-STAT pathway activation in nephrocytes on a high-fat diet. Nephrocytes are the homologs to mammalian podocytes and it has been previously shown, that metabolic syndrome and obesity are associated with worse outcomes for chronic kidney disease. A study from 2021 (Lubojemska et al.) could already confirm a severe nephrocyte phenotype upon feeding Drosophila a high-fat diet and also linking lipid overflow by expressing adipose triglyceride lipase in the fat body to nephrocyte dysfunction. In this study, the authors identified a second pathway and mechanism, how lipid dysregulation impact on nephrocyte function. In detail, they show activation of JAK-STAT signaling in nephrocytes upon feeding them a high-fat diet, which was induced by Upd2 expression (a leptin-like hormone) in the fat body, and the adipose tissue in Drosophila. Further, they could show genetic and pharmacological interventions can reduce JAK-STAT activation and thereby prevent the nephrocyte phenotype in the high-fat diet model.

      Strengths:

      The strength of this study is the combination of genetic tools and pharmacological intervention to confirm a mechanistic link between the fat body/adipose tissue and nephrocytes. Inter-organ communication is crucial in the development of several diseases, but the underlying mechanisms are only poorly understood. Using Drosophila, it is possible to investigate several players of one pathway, here JAK-STAT. This was done, by investigating the functional role of Hop, Socs36E, and Stat92E in nephrocytes and has also been combined with feeding a high-fat diet, to assess restoration of nephrocyte function by inhibiting JAK-STAT signaling. Adding a translational approach was done by inhibiting JAK-STAT signaling with methotrexate, which also resulted in attenuated nephrocyte dysfunction. Expression of the leptin-like hormone upd2 in the fat body is a good approach to studying inter-organ communication and the impact of other organs/tissue on nephrocyte function and expands their findings from nephrocyte function towards whole animal physiology.

      Weaknesses:

      Although the general findings of this study are of great interest, there are some weaknesses in the study, which should be addressed. Overall, the number of flies investigated for the majority of the experiments is very low (6 flies) and it is not clear whether the flies used, are from independent experiments to exclude problems with food/diet. For the analysis, the mean values of flies should be calculated, as one fly can be considered a biological replicate, but not all individual cells. By increasing the number of flies investigated, statistical analysis will become more solid. In addition, the morphological assessment is rather preliminary, by only using a Pyd antibody. Duf or Sns should be visualized as well, also the investigation of the different transgenic fly strains studying the importance of JAK-STAT signaling in nephrocytes needs to include a morphological assessment. Moreover, the expected effect of feeding a high-fat diet on nephrocytes needs to be shown (e.g. by lipid droplet formation) and whether upd2 is actually increased here should also be assessed. The time points of assessment vary between 1, 3, and 7 days and should be consistent throughout the study or the authors should describe why they use different time points.

      We thank the reviewer for the comments and suggestions. HFD causes enlarged crop (Liao et al, 2021, PMID: 33171202) and accumulation of lipid droplets in the intestine. To exclude the problems with different batches of food/diet, we checked crop and the intestine during the sample preparation as indications of food consistency.

      We followed the suggestion to take the mean values of flies in the data analysis, one was considered a biological replicate in the revised version. We added in another slit diaphragm protein reporter Sns-mRuby3, in which mRuby3 fluorescent protein was tagged at the C-terminal of endogenous Sns. This reporter was used to show the effect of HFD on slit diaphragm protein, manipulation of Jak/Stat pathway (ppl-Gal4>upd2 and dot-Gal4>UAS-Stat92E-RNAi), and drug treatment.

      Lubojemska et al 2021 (PMID: 33945525) showed that HFD leads to lipid droplet accumulation in larval nephrocytes. Following the reviewer’s suggestion, we stained the adult nephrocytes with Nile red and found lipid droplet formation caused by HFD, verifying the HFD effects on lipid droplet accumulation.

      Regarding the timepoints, the newly eclosed flies (1-day old) were treated for 7 days (transferred to fresh diet or shifted from 18 to 29 °C for 7 days to induce target gene expression). Thus, the flies were 7 days old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. The exception was Figure 4 panel G and H, we used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are relevant issues, that should be addressed:

      Major:

      - The analysis of JAK/STAT signaling in nephrocytes is limited to nephrocyte function, despite the nice slit diaphragm phenotype shown in Figure 2A. What happens to the slit diaphragm in the other genotypes, the rescue settings in particular? Immunofluorescence of Pyd should be explored for all conditions to evaluate proper phenocopy. Tracer endocytosis is much less specific.

      We thank the reviewer for the suggestion. We made a transgenic line Sns-mRuby3, in which mRuby3 was tagged to the endogenous Sns C-terminal. It has been used as a slit diaphragm reporter (PMID: 39195240 and PMID: 39431457). Apart from the tracer assays, we used Sns-mRuby3 reporter and/or Pyd staining to visualize the changes in slit-diaphragm structures.

      - The interventions are restricted to single RNAi lines and reporters, raising concerns about specificity/potential off-targets. Additional lines should be tested for verification.

      Different versions of RNAi lines are available for targeting fly genes. For UAS-Socs36E-RNAi, we chose the one that was generated with a short hairpin, which is known to restrict the off-target effects (Ni et al, 2011, PMID: 21460824). For UAS-Stat92E-RNAi, we added in an independent RNAi line (Figure 6 - figure supplement 1 and 2).   

      Minor:

      - In Figure 2C, the image of HFD shows a section that cuts through the surface at a shallower angle, making everything appear blurry. This image should be replaced.

      We replaced Figure 2C (the image of HFD) with another one.

      - What is the relevance (if any) of reduced electrodense vacuoles with a high-fat diet? An effect on endocytic trafficking/endosome architecture remains unexplored.

      Lubojemska et al (PMID: 33945525) studied the endocytic trafficking/endosome architecture of the larval nephrocytes and found that HFD impaired the endocytosis. We studied the adult pericardial nephrocytes. It is very likely that the endocytic trafficking/endosome architecture is affected by HFD in the adult nephrocytes.  

      - How do the findings presented in this manuscript correlate with a similar study by Lubojemska et al.? At least the discussion should provide more evaluation of this aspect.

      Lubojemska et al (PMID: 33945525) assayed the larval nephrocytes and found that a HFD leads to the ectopic accumulation of lipid droplets in the nephrocytes and decreased endocytosis. They further demonstrated that lipid droplet lipolysis and PGC1α counteracts the harmful effects of a HFD. We performed Nile red staining and verified the accumulation of lipid droplets in the adult pericardial nephrocytes upon HFD feeding, which agrees with Lubojemska discovery. We found that a HFD activates Jak/Stat pathway, which mediates the nephrocyte functional defects. A previous study showed that Stat1 has an inhibitory effect on PGC1α transcription (PMID: 26689548). Further study is needed to investigate the interaction between Jak/Stat pathway and PGC1α transcription. We added the information to the discussion.

      - Please check spelling and grammar.

      Reviewer #2 (Recommendations For The Authors):

      (1) Which cells are investigated? Please state.

      Pericardial nephrocytes were used in this study. The information was added to the result parts.

      (2) Rephrase 'chronic kidney disease model'. Feeding for 7 days and assessment after 7 days cannot be considered chronic as flies can live more than 60 days.

      Lubojemska et al (PMID: 33945525) fed the newly hatched larvae with a HFD and used the third instar larvae for the experiments. The term “chronic kidney disease” has been used in the reference PMID: 33945525. It takes about 4 days for fly larvae to develop from the first instar to the third instar. Thus, the animals were fed on the HFD for only 4 days. In this regard, feeding for seven days might be considered as chronic.

      (3) Line 89: Curran et al., 2014). with risk increasing risk as BMI increases (Hsu et al., 2006). Please correct this sentence.

      We thank the reviewer for finding the error. In the revised version, the sentence was changed as “with increasing risk as BMI increases (Hsu et al., 2006)”.

      (4) Figure 1: The authors should explain why they use FITC-Albumin and 10kDA dextran, what are the differences, and why are both used?

      The tracers are different in size (70kD FITC-Albumin and 10kDA dextran). Both FITC-Albumin and 10kDA dextran have been used in previous publications (Zhao et al 2024, PMID: 39431457 and Weavers et al 2009, PMID: 18971929) to show that the nephrocytes can efficiently take up the tracers of different sizes.

      (5) Figure 3: The JAK-STAT sensor was used on Day 1 to confirm activation of JAKSTAT signaling, which means a very fast response towards the HFD after 24hrs. How is the activation after 7 days? The nephrocyte assessment in Figures 1 and 2 is done at the later time point, how about earlier time points in HFD? One would expect an earlier phenotype as well if JAK-STAT signaling is causative.

      In Figure 3C, newly eclosed flies (1-day old) were fed on a control diet or a HFD for 7 days. Thus, in the legend it shall be “7-day-old females”. Sorry for misleading. The caption was updated as “7-day-old females”.

      (6) Figure 4H: I don't understand how many cells or flies are depicted and analysed? Are the dots one nephrocyte from 4 flies? If yes, the numbers need to be increased.

      In figure 4H, we quantified 5 UAS-hop.Tum clones and 5 neighbor cells. We only found 5 clones from 4 flies. We didn’t quantify all the nephrocytes, since we compared the clone with its neighbor cell. To make it easier to follow, we changed the description as “n= 5 clones and 5 neighbor cells”.

      (7) Figure 4: Why are flies investigated at different ages? Day 1 vs Day 3? This should be consistent with the HFD approach and day 7. Or investigate the HFD at earlier time points as well.

      In Figure 4, the newly eclosed flies (1-day old) were shifted from 18 to 29 °C for 7 days to induce target gene expression. Thus, the flies were 7-day old. In the revised manuscript, we changed “1-day-old females” to “7-day-old females” in the figure legend. We used Day 3 for the UAS-hop.Tum overexpression in the flp-out clones, which is different from the HFD approach (Day 7). This is because Hop.Tum is a strong gain of function mutation. UAS-hop.Tum overexpression in the eye imaginal disc leads to apoptosis via up-regulating a proapoptotic gene hid (Bhawana Maurya et al, 2021, PMID: 33824299). Thus, we used Day 3 for this experiment.

      (8) Figure 5: Do the authors see upd2-GFP in the nephrocyte or at the nephrocyte? Is upd2 filtered to bind the JAK-STAT-receptor? They should show this, which is easy to do due to the GFP label.

      We thank the reviewer for the suggestion. We looked into the nephrocyte from ppl-Gal4>upd2-GFP flies and found Upd2-GFP in the nephrocytes. We further showed that ppl-Gal4 was not expressed in the nephrocytes, suggesting that Upd2-GFP is secreted from the fat body and transported to the nephrocytes. We stained the nephrocytes for Pyd and found compromised fingerprint pattern caused by Upd2-GFP expression in the fat body. The data was added to Figure 5 - figure supplement 1.

      (9) Figure 5: What are the upd2 levels after day 1 and compared to HFD at day 7? In the Rajan et al manuscript, upd2 levels have been assessed by qPCR, this can be done here as well. Although there is a mechanistic link shown here, I think it would be interesting to test the upd2 levels at the different time points assessed.

      In the Rajan et al manuscript, they showed that the expression of upd2 was up regulated by HFD. My previous work showed that HFD changes taste perception. We performed qPCR to determine the expression of upd2 and verified that upd2 was upregulated in HFD fed flies (Yunpo Zhao et al. 2023. PMID: 37934669). We included the reference in the revised version.

      (10) Figure 6: Does a Socs36E overexpression e.g. with the Bloomington strain 91352 also rescue the HFD phenotype, by blocking JAK-STAT signaling?

      We thank the reviewer for the suggestion. We tested the effect of Socs36E overexpression and observed that UAS-Socs36E can partially rescue HFD caused nephrocyte functional decline. The data was not included in the revised manuscript. Notably, apart from having an inhibitory effect on the Jak/Stat, Socs36E represses MAPK pathway (Amoyel et al, 2016, PMID: 26807580).    

      (11) Figure 7: What is the control for the methotrexate treatment? What is the solvent?

      We used DMSO as the solvent for methotrexate and used it as the control for the methotrexate treatment. We added the following sentences to the method parts, “Methotrexate (06563, Sigma-Aldrich, MO) was dissolved in DMSO to make a 10mM stock solution”, and “The samples incubated in Schneider’s Medium supplemented with DMSO vehicle were used a control”.

      (12) Why did the authors use Dot-Gal4 for the Socs36E knockdown and Dot-Gal4ts for the Stat92E knockdown?

      We used Dot-Gal4ts and temperature shifting to restrict the Stat92E knockdown at adult stages.

      (13) Supplementary Figure 1: Please add the individual data to the figure as done for all other figures.

      We thank the reviewer for this comment. The figure individual data was added according to the suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalise these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed.

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it is even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model’s results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which so far has not been brought up when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting. This study, however, focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling. The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

      Summary from the Reviewing Editor:

      The authors present a simple one-dimensional biophysical model to describe the gliding motion and the observed statistics of trajectory reversals. However, the model does not capture some important experimental findings, such as the buckling occurring in long filaments, and the coupling between rotation, slime generation, and motion. More effort is recommended to integrate the information gathered on these different aspects to provide a more unified understanding of filament motility. In particular, the referees suggest performing a more quantitative analysis of the buckling in long filaments. Finally, it is also recommended to discuss the results in the context of previous literature, in order to better explain their relevance. Please find below the detailed individual recommendations of the three reviewers.

      We thank the editor for this accurate summary of the presented work and for highlighting the key points raised by the reviewers. We have provided below point-by-point replies to these.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The relevance of the study organism Fluctiforma draycotensis is not clearly explained, and the results are not discussed in the context of previous literature. The motivation would be clearer if the manuscript explained why this model organism was chosen and how the results compare with those previously observed for this or other organisms.

      We have extended the introduction and discussion sections to make it clearer why we have worked with this organism and how the findings from this work relate to previous ones. In brief, Flucitforma draycotensis is a useful organism to work with as it not only displays significant motility but it also displays intriguing collective behaviour at different scales. Previous works on gliding motility in filamentous cyanobacteria have mostly focussed on the model organism Nostoc punctiforme, which only displays motility after differentiation into hormogonia [1]. There have also been studies in a range of different filamentous species, including those of the non-monophyletic genus, Phormidium, but these studies mostly looked at effects of genetic deletions on motility [2] or utilised electron microscopy to identify proteins (or surface features) involved in motility [3-5]. It must be noted that motility is also described and studied in non-filamentous cyanobacteria, but the dynamics of motion and molecular mechanisms there are different to filamentous cyanobacteria [6,7]. These previous studies are now cited / summarised in the revised introduction and discussion sections.

      The inferred tracks, probably associated with secreted slime, play a key role since it is supposed that the tracks provide the external force that keeps the filaments straight. Movie S3, in phase contrast, provides convincing evidence for the tracks, but they cannot be seen in the fluorescence images presented in the main text. Clearer evidence of them should be shown in the main text. An especially important aspect of the tracks is where they start and end since the computational model assumes that reversal happens due to forces generated by reaching the end of a track. Therefore it seems important to comment on what produces the tracks, to check whether reversals actually happen at the end of a track, etc. Perhaps tracks could be strained with Concanavalin-A?

      To confirm that reversals happen on track ends, we have now performed an analysis on agar, where we can see tracks on phase microscopy. This analysis confirms that, on agar, reversals indeed happen on track ends. We added this analysis, along with images showing tracks clearly as a new Fig in the main text (see new Fig. 1).

      Further confirming the reversal at track ends, we note that filaments on circular tracks do not not reverse over durations longer than the ‘expected reversal interval’ of a filament on a straight track (see details in response to Reviewer 2).

      Regarding what produces the tracks on agar, we are still analysing this using different methods and these results will be part of a future study. Fluorescent staining can be used to visualise slime tubes using TIRF microscopy, as shown in Fig. S8, however, visualising tracks on agar using low magnification microscopy has been difficult due to background fluorescence from agar.

      We would also like to clarify that the model does not incorporate any assumptions regarding the track-filament interaction, other than that the track ends behave akin to a physical boundary for the filament. The observed reversal at track ends and “what” produces the track are distinct aspects of filament motion. We do not think that the model’s assumption of filament reversal at the end of the track requires understanding of the mechanism of slime production.

      Reviewer #3 (Recommendations for the authors):

      The manuscript combines three distinct topics: (1) the difference in locomotion on glass vs agar, (2) the development of a biophysical model, and (3) the helical motion of filament. It is not clear what insight one can gain from any one of these topics about the two others. The manuscript would be strengthened by more clearly connecting these three aspects of the work. A stronger comparison of theory to observation would be very useful. Some suggestions:

      (1) The observation that it is only the longest filaments that buckle is interesting. It should be possible to predict the critical length from the biophysical model. Doing so could allow fits of some model parameters.

      (2) What model parameters change between glass and agar? Can you explain these qualitative differences in motility by changing one model parameter?

      (3) Is it possible to exert a force on one end of a filament to see if it is really mechano-sensing that couples their motion?

      We thank the reviewer for this comment and agree with them that a better connection between model and experiment should be sought. We believe that the new analyses, presented below in response to the 2nd suggestion of the reviewer, provide such a connection in the context of reversal frequency. As stated below, we think that the 1st suggestion falls outside of the scope of the current work, but should form the basis of a future study.

      Regarding suggestion (1) - addressing buckling:

      We agree with the reviewer that using a model to predict a critical buckling length would be useful. We note, however, that the presented study focussed on cell-to-cell coupling / coordination during filament motility using a 1D, beadchain model. The buckling observations served, in this context, as evidence of cellular de-coordination. Now that we have observed buckling (and plectoneme formation), these processes need to be analysed with further experiments and modelling. The appropriate model for studying buckling would have to be at least 2D (ideally 3D) and consider elastic forces and torques relating to filament bending, rotation, and twisting. Experimentally, we need to identify means of influencing filament length and motion and undertake further measurements of buckling frequency and position across different filament lengths. These investigations are ongoing and will be summarised in a separate, future publication.

      Regarding suggestion (2) - addressing differences in motility on agar vs. glass:

      We believe that the two key differences between agar and glass experiments are the occasional detachment of filaments from substrate on glass and the lack of confining tracks on glass. These differences might arise from the interactions between the filament, the slime, and the surface. As both slime and agar contain polysaccharides, the slime-agar interaction can be expected to be different from the slime-glass interaction. Additionally, in the agar experiments, the filaments are confined between the agar and a glass slide, while they are not confined on the glass, leaving them free to lift up from the glass surface. We expect these factors to alter reversal frequency between the two conditions. To explore this possibility, we have now extended the analysis of experimental data from glass and present that (see details below):

      (i) dwell times are similar between agar and glass, and

      (ii) reversal frequency distribution is different between glass and agar, and remains constant across filament length on glass.

      We were able to explore these experimental findings with new model simulations, by removing the assumption of an “external bounding frame”. We then analysed reversal frequency within against model parameters, as detailed below.

      “The movement of the filaments on glass. We have extended our analysis of motility on glass resulting in the following noted features. Firstly, the median speed shows a weak positive correlation with filament length on glass (see original Fig S3B vs. updated Fig. S3A). This is slightly different to agar, where we do not observe any strong correlation in either direction (see original, Fig. 1 vs. updated Fig 2). Both the cases of positive, and no correlation, support our original hypothesis that the propulsion force is generated by multiple cells within the filament.

      Secondly, the filaments on glass display ‘stopping’ events that are not followed by a reversal, but are instead followed by a continuation in the original direction of motion, which we term ‘stop-go’ events, in contrast to the reversals. The dwell times associated with reversals and ‘stop-go’ events are similarly distributed (see original Fig S3A vs. updated Fig S3B). Furthermore, the dwell time distributions are similar between agar and glass (compare old Fig. 1C vs. new Fig 2C and new Fig. S3B). This suggests that the reversal process is the same on both agar and glass.

      Thirdly, we find that the frequencies of both reversal and stop-go events on glass are uncorrelated with the filament length (see new Fig. S4A) and there are approximately twice as many reversals as stop-go events. In contrast, the filaments on agar reverse with a frequency that is inversely proportional to the filament length (which is in turn proportional to the track length) (see original Fig. S1). The distribution of reversal frequencies on agar is broader and flatter than the distribution on glass (see new Fig. S4B). These findings are inline with the idea that tracks on agar (which are defined by filament length) dictate reversal frequency, resulting in the strong correlations we observe between reversal frequency, track length, and filament length. On glass, filament movement is not constrained by tracks, and we have a specific reversal frequency independent of filament length.”

      “Model can capture movement of filaments on glass and provides hypotheses regarding constancy of reversal frequency with length. We believe the model parameters controlling cellular memory (ω<sub>max</sub>) and strength of cellular coupling (K<sub>ω</sub>) describe the internal behaviour of a filament and therefore should not change depending on the substrate. Thus, we expect the model to be able to capture movement on glass just by removal of any ‘confining tracks’, i.e external forces, from the simulations. Indeed, we find that the model displays both stop-go and reversal events when simulated without any external force and can capture the dwell time distribution under this condition (compare new Figs. S12,S13 with S3).

      In terms of reversal frequency, however, the model shows a reduction in reversal frequency with filament length (see new Fig. S15). This is in contrast to the experimental data. We find, however, that model results also show a reduction in reversal frequency with increasing (ω<sub>max</sub> and K<sub>ω</sub> (see new Fig. S14 and S15). This effect is stronger with (ω<sub>max</sub>, while it quickly saturates with K<sub>ω</sub> (see new Fig. S14). Therefore, one possibility of reconciling the model and experiment results in terms of constant reversal frequency with filament length would be to assume that (ω<sub>max</sub> is decreasing with filament length (see new Fig. S16). Testing this hypothesis - or adding additional mechanisms into the model - will constitute the basis of future studies.”

      Regarding suggestion (3) - role of mechanosensing:

      We have tried several experiments to evaluate mechanosensing. First, we have used a micropipette or a thin wire placed on the agar, to create a physical barrier in the way of the filaments. The micropipette approach was not quite feasible in our current setup. The wire approach was possible to implement, but the wire caused a significant undulation / perturbation on agar. Possibly relating to this, filaments tended to continue moving alongside the wire barrier. Therefore, these experiments were inconclusive at this stage with regards to mechanosensing a physical barrier. As an alternative, we have attempted trapping gliding filaments using an optical trap with a far red laser that should not affect the physiology of the cells. This did not cause an immediate reversal in filament motion. However, this could be due to the optical trap strength being below the threshold value for mechanosensing. The force per unit length generated by filamentous cyanobacteria has been calculated via a model of self-buckling rods, giving a value of ≈1nN/μm [8]. In comparison, the optical trap generates forces on the scale of pN. Thus, the trap force is several orders of magnitude lower than the propulsive force generated by a filament, given filament lengths in the range of ten to several hundreds μm. We conclude that the lack of observed response may be due to the optical trap force being too weak.

      Thus, the experiments we can perform using our current available methods and equipment are not able to prove either the presence or the absence of mechanosensing in the filament. We plan to perform further experiments in this direction, involving new and/or improved experimental setups, such as use of Atomic Force Microscopy.

      We would like to note that there is an additional observation that supports the idea of reversals being mediated by mechanosensing at the end of a track, instead of the locations of the track ends being caused by the intrinsic reversal frequency of the filament. In a few instances (N = 4), filaments on agar ended up on a circular track (see Movie S4 for an example). These filaments did not reverse over durations a few times longer than the ‘expected reversal interval’ of a filament on a straight track.

      Should $N$ following eq 7 and in eq 9 be $N_f$?

      We have corrected this typo.

      It would be useful to include references to what is known about mechanosensing in cyanobacteria.

      We agree with the reviewer, and we have not updated the discussion section to include this information. Mechanosensing has not yet been shown directly in any cyanobacteria, but several species are shown to harbor genes that are implicated (by homology) to be involved in mechanosensing. In particular, analysis of cyanobacterial genomes predicts the presence of a significant number of homologues of the Escherichia coli mechanosensory ion channels MscS and MscL [9]. We have also identified similar MscS protein sequences in F. draycotensis. These channels open when the membrane tension increases, allowing the cell to protect itself from swelling and rupturing when subject to extreme osmotic shock. [10,11]

      We also note that F. draycotensis, as with other filamentous cyanobacteria, have genes associated with the type IV pili, which may be involved in the surface-based motility [1]. Type IV pili have been shown to be mechanosensitive. For example, in cells of Pseudomonas aeruginosa that ‘twitch’ on a surface using type IV pili, application of mechanical shear stress results in increased production of an intracellular signalling molecule involved in promoting biofilm production. The pilus retraction motor has been shown to be involved in this shear-sensing response [12]. Additionally, twitching P. aeruginosa cells often reverse in response to collisions with other cells. Reversal is also caused by collisions with inert glass microfibres, which suggests that the pili-based motility can be affected by a mechanical stimulus [13].

      References

      (1) D. D. Risser, Hormogonium Development and Motility in Filamentous Cyanobacteria. Appl Environ Microbiol 89, e0039223 (2023).

      (2) T. Lamparter et al., The involvement of type IV pili and the phytochrome CphA in gliding motility, lateral motility and photophobotaxis of the cyanobacterium Phormidium lacuna. PLoS One 17, e0249509 (2022)

      (3) E. Hoiczyk, Gliding motility in cyanobacteria: observations and possible explanations. Arch Microbiol 174, 11-17 (2000).

      (4) D. G. Adams, D. Ashworth, B. Nelmes, Fibrillar Array in the Cell Wall of a Gliding Filamentous Cyanobacterium. Journal of Bacteriology 181 (1999).

      (5) L. N. Halfen, R. W. Castenholz, Gliding in a blue-green alga: a possible mechanism. Nature 225, 1163-1165 (1970).

      (6) S. N. Menon, P. Varuni, F. Bunbury, D. Bhaya, G. I. Menon, Phototaxis in Cyanobacteria: From Mutants to Models of Collective Behavior. mBio 12, e0239821 (2021).

      (7) F. D. Conradi, C. W. Mullineaux, A. Wilde, The Role of the Cyanobacterial Type IV Pilus Machinery in Finding and Maintaining a Favourable Environment. Life (Basel) 10 (2020).

      (8) M. Kurjahn, A. Deka, A. Girot, L. Abbaspour, S. Klumpp, M. Lorenz, O. Bäumchen, S. Karpitschka Quantifying gliding forces of filamentous cyanobacteria by self-buckling. eLife 12:RP87450 (2024).

      (9) S.C. Johnson, J. Veres, H. R. Malcolm, Exploring the diversity of mechanosensitive channels in bacterial genomes. Eur Biophys J 50, 25–36 (2021).

      (10) S.I. Sukharev, W.J. Sigurdson, C. Kung, F. Sachs, Energetic and spatial parameters for gating of the bacterial large conductance mechanosensitive channel, MscL. Journal of General Physiology, 113(4), 525-540 (1999).

      (11) N. Levina, S. Tötemeyer, N.R. Stoke, P. Louis, M.A. Jones, I.R. Boot. Protection of Escherichia coli cells against extreme turgor by activation of MscS and MscL mechanosensitive channels: identification of genes required for MscS activity. The EMBO journal (1999).

      (12) V.D. Gordon, L. Wang, Bacterial mechanosensing: the force will be with you, always. Journal of cell science 132(7):jcs227694 (2019).

      (13) M.J. Kühn, L. Talà, Y.F. Inclan, R. Patino, X. Pierrat, I. Vos, Z. Al-Mayyah, H. Macmillan, J. Negrete Jr, J.N. Engel, A. Persat, Mechanotaxis directs Pseudomonas aeruginosa twitching motility. Proceedings of the National Academy of Sciences. 118(30):e2101759118 (2021).

    1. Author response:

      We sincerely thank the editor and all three reviewers for their constructive comments. We deeply appreciate the reviewers’ efforts in highlighting both the strengths and the weaknesses of our study. To enhance the quality and clarity of our work, we plan to address the concerns raised in the public reviews through the following actions:

      (1) Improving the tone and language of the manuscript

      We will revise the manuscript thoroughly, incorporating additional explanations and clarifications where necessary, and improving the tone and language to enhance readability and precision. Especially, we will pay careful attention on the terms “positional information,” “positional value,” and “positional cue,” and we plan to explain them in a historical context.

      (2) Extending analysis to regular blastemas

      To validate the applicability of our proposed model beyond the accessory limb model (ALM), we will examine the gene expression patterns of key signaling molecules in regular blastemas generated by limb amputation. This will allow us to test whether the mechanisms we describe are also active during normal limb regeneration.

      (3) Increasing sample sizes in critical experiments

      In order to ensure reproducibility and statistical reliability, we will increase the number of biological replicates in key experiments within the limitations regulated by our animal ethics approval. Additionally, we will collect data that clearly defines the dorsal/ventral axis within the structures, as far as possible. We will also revise the manuscript to pay closer attention to the anterior/posterior/dorsal/ventral axis in the existing data, ensuring that it is clearly described.

      (4) Adding quantitative gene expression data

      To support and reinforce our in situ hybridization results, we will include additional quantitative gene expression analyses (e.g., qRT-PCR), thereby strengthening the conclusions drawn from our expression data.

      We are grateful for the reviewers’ insights and are confident that these revisions will significantly strengthen our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews:

      We sincerely thank the reviewers for their thoughtful review and feedback. We believe that our work will provide valuable insights into how MRSA evolves under bacteriophage predation and stimulate efforts to use genetic trade-offs to combat drug resistance. We have substantially revised the paper and performed several additional experiments to address the reviewers' questions and concerns.

      Summary:

      (1) Testing for genetic trade-offs in additional S. aureus strains

      We obtained 30 clinical isolates of the S. aureus USA300 strain that were isolated between 2008 and 2011 (see Table S1). We first tested the FStaph1N, Evo2, and FNM1g6 phages against this expanded strain panel and found that Evo2 showed strong activity against all 30 strains (Table S4). We tested whether Evo2 infection could elicit trade-offs in b-lactam resistance for a subset of these strains. We found that Evo2 infection caused a ~10-100-fold reduction in their MIC against oxacillin. This data is now incorporated into a revised Figure 2 in panel C.

      (2) Testing additional staphylococcal phages

      We isolated from the environment a phage called SATA8505. Similar to FStaph1N and Evo2, SATA8505 belongs to the Kayvirus genus and infects the MRSA strains MRSA252, MW2, and LAC. Phage-resistant MRSA recovered following SATA8505 infection also showed a strong reduction in oxacillin resistance (Figure S5). Furthermore, we confirmed that resistance against FNM1g6, which belongs to the Dubowvirus genes, does not elicit tradeoffs in b-lactam resistance (Figure S4). Sequencing analysis of FNM1g6 - resistant LAC strains showed a different mutation fmhC, which was not observed with the FStaph1N and Evo2 phages (Table 1). We have added this new data into the main text and supplemental figures and tables. Future work will focus on obtaining comprehensive analysis of a wide range of phage families. 

      (3) Testing additional antibiotics

      We also expanded our trade-off analysis include wider range of antibiotic classes (Table S3). Overall, the loss of resistance appears to be confined to b-lactams.

      (4) Genetic analysis of ORF141

      In order determine the function of ORF141, which is mutated in Evo2, we attempted to clone wild-type ORF141 into a staphylococcal plasmid and perform complementation assays with Evo2. Unfortunately, obtaining the plasmid-borne wild-type ORF141 has proven to be tricky, as all clones developed frameshift or deletions in the open reading frame. We posit that the gene product of ORF141 is toxic to the bacteria. We are currently working on placing the gene under more stringent expression conditions but feel that these efforts fall outside of the scope of this paper.  

      (5) Testing the effect of single mutants  

      Our genomic analysis showed that phage-resistant MRSA evolved multiple mutations following phage infection, making it difficult to determine the mechanism of each mutation alone. For example, phage-resistant MW2 and LAC evolved nonsense mutations in transcriptional regulators mgrA, arlR, and sarA. To test whether these mutations alone were sufficient to confer resistance, we obtained MRSA strains with single-gene knockouts of mgrA, arlR, and sarA and tested their ability to resist phage. We observed that deletion of mgrA in the MW2 resulted in a modest reduction in phage sensitivity (Figure S7). However, we did not the observe any changes in the other mutant strains. These results suggest that phage resistance in these strains is likely caused by a combination of mutations. Determining the mechanisms of these mutations is the focus if our future work.

      (6) Transcriptomics of phage-resistant MRSA strains

      To further assess the effects of the phage resistance mutations, we performed bulk RNA-seq on phage-resistant MW2 and LAC strains and compared their differential expression levels to the respective wild-type strains. We picked these strains because our genomic data showed that they had evolved mutations in known transcriptional regulators (e.g. mgrA). Our analysis shows that both strains significantly modulate their gene expression (Figure 4). Notably, both strains upregulate the cell wall-associated protein ebh, while downregulating several genes involved in quorum sensing, virulence, and secretion. We have included this new data in Figure 4 and Table S5 and added an entire section in the manuscript discussing these results and their implications.  

      (7) Co-treatment of MRSA with phage and b-lactam

      We performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance, showing a distinct mutational profile. We discuss these results in detail in the main text.

      Reviewer # 1:

      Strengths:

      Phage-mediated re-sensitization to antibiotics has been reported previously but the underlying mutational analyses have not been described. These studies suggest that phages and antibiotics may target similar pathways in bacteria.

      We thank Reviewer 1 for this assessment. We hope that the data provided in this work will help stimulate further inquiries into this area and help in the development of better phage-based therapies to combat MRSA.

      Weaknesses:

      One limitation is the lack of mechanistic investigations linking particular mutations to the phenotypes reported here. This limits the impact of the work.

      We acknowledge the limitations of our initial analysis. We note (and cite) that separate studies have already linked mutations in femA, mgrA, arlR, and sarA with reduced b-lactam resistance and virulence phenotypes in MRSA, but not to phage resistance. For the other mutations, we could not find literature linking them to our observed phenotypes. We analyzed the effects of single gene knockouts of mgrA, arlR, and sarA on MRSA’s phage resistance. However, as shown above, the results only showed modest effects on phage resistance in the MW2 strain (see Figure S7 and lines 309-317). We therefore believe that mutations in single genes are not sufficient to cause the trade-offs in phage/ b-lactam resistance. Because each MRSA strain evolved multiple mutations (e.g. MW2 evolved 6 or more mutations), we feel that determining the effects of all possible permutations of those mutations was beyond the scope of the paper.

      However, to bridge the mutational data with our phenotypic observations, we performed RNAseq and compared the transcriptomes of un-treated and phage-treated MRSA strains (see Figure 4, Table S5, and lines 337-391). Our results show that phage-treated MRSA strains significantly modulate their transcript levels. Indeed, some of the changes in gene expression can explain for the phenotypic observations (e.g. overexpression of ebh can lead to reduced clumping). Further, the results shown some unexpected patterns, such as the downregulation of quorum sensing genes or genes involved in type VII secretion.

      Another limitation of this work is the use of lab strains and a single pair of phages. However, while incorporation of clinical isolates would increase the translational relevance of this work it is unlikely to change the conclusions.

      We thank the reviewer for this suggestion. We would like to clarify that MW2, MRSA252, and LAC are pathogenic clinical isolates that were isolated between 1997 and 2000’s. However, we acknowledge that, because these 3 strains have been propagated for many generations, they might have acquired laboratory adaptations. We therefore obtained 30 USA300 clinical strains that were isolated in more recent years (~2008-2011) and tested our phages against them. We note that these clinical isolates (generously provided by Dr. Petra Levin’s lab) were preserved with minimal passaging to reduce the effects of laboratory adaptation. We found that the Evo2 phage was able to elicit oxacillin trade-offs in those strains as well. (see Table S1, Table S7, Fig 2C, and lines 210 – 225)

      For the phages, we had to work with phage(s) that could infect all three MRSA strains. That is why in our initial tests, we focused on FStaph1N and Evo2, both members of the Kayvirus genus. Now in our revised work, we extend our analysis to FNM1g6, a member of the Dubowvirus genus, that also infects the LAC strain, but not MW2 and MRSA252. We find that FNM1g6 is unable to drive trade-offs in b-lactam resistance (see lines 229 – 238). Next, we analyzed the effects of SATA8505, also a member of the Kayvirus genus. Here, we observed that SATA8505 can elicit trade-offs in b-lactam resistance (see Figure S5 and lines 238 – 246). These results suggest that not all staphylococcal phages can elicit these trade-offs and call for more comprehensive analyses of different types of phages.

      Reviewer #1 (Recommendations for the authors):

      Specific questions:

      (1) The Evo2 isolate is an evolved version of phage Staph1N with more potent lytic activity. Is this reflected in more pronounced antibiotic sensitivity?

      We did not observe that Evo2-treated MRSA cells showed more sensitivity towards b-lactams. However, we did observe that Evo2 was able to elicit these trade-offs at lower multiplicities of infection (MOI) (see lines 173 – 176 and Figure S2). Further, we did observe that Evo2 caused a greater trade-off in virulence phenotypes (hemolysis and cell agglutination) (see lines 416 - 419 lines 433 – 435, and Figure 5)

      In our revisions, we also tested Evo2-treated MRSA against a wide range of antibiotics. We did not observe significant changes in MICs against those agents.   

      (2) Are there mutations in the SCCmec cassette or the MecA gene after selection against ΦStaph1N?

      We did not observe any mutations in known resistance genes SCCmec or blaZ. Furthermore, we did not see any differential expression of those genes in our transcriptomic data (see lines 344 and 346).  

      (3) The authors report that phage ΦNM1γ6 does not induce antibiotic sensitivity changes despite being effective against bacterial strain LAC. Were mutational sequencing studies performed with the resistant isolates that emerged against this strain? Can the authors hypothesize why these did not impact the virulence or resistance of LAC despite effective killing? How does this align with their models for ΦStaph1N?

      We thank the reviewer for that insightful question. In our revised manuscript, we found that ΦNM1γ6 elicits a point mutation in the fmhC gene, which is involved in cell wall maintenance (see lines 326 – 335). To our knowledge, this point mutation has not been linked to phage resistance or drug sensitivity MRSA. Notably this mutation was not observed with ΦStaph1N or Evo2. We therefore speculate that ΦNM1γ6 binds to a different receptor molecule on the MRSA cell wall.   

      (4) If I understand correctly, the authors attribute these effects of phage predation on antibiotic sensitivity and virulence to orthogonal selection pressures. A good test of this model would be to examine the mutations that emerge in antibiotic/phage co-treatment. This should be done.

      We thank the reviewer for this suggestion. As described in the summary section above, we performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (see lines 440 – 494 and Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance and only limited phage resistance, showing a distinct mutational profile (Figure S6). Under these conditions, we think that the selective pressure exerted by FStaph1N is “overcome” by the selective pressure of the high oxacillin concentration, a point that we discuss in the main text.

      Reviewer #2 (Public review):

      Summary:

      The work presented in the manuscript by Tran et al deals with bacterial evolution in the presence of bacteriophage. Here, the authors have taken three methicillin-resistant S. aureus strains that are also resistant to beta-lactams. Eventually, upon being exposed to phage, these strains develop beta-lactam sensitivity. Besides this, the strains also show other changes in their phenotype such as reduced binding to fibrinogen and hemolysis.

      Strengths:

      The experiments carried out are convincing to suggest such in vitro development of sensitivity to the antibiotics. Authors were also able to "evolve" phage in a similar fashion thus showing enhanced virulence against the bacterium. In the end, authors carry out DNA sequencing of both evolved bacteria and phage and show mutations occurring in various genes. Overall, the experiments that have been carried out are convincing.

      We thank Reviewer 2 for their positive comments.

      Weaknesses:

      Although more experiments are not needed, additional experiments could add more information. For example, the phage gene showing the HTH motif could be reintroduced in the bacterial genome and such a strain can then be assayed with wildtype phage infection to see enhanced virulence as suggested. At least one such experiment proves the discoveries regarding the identification of mutations and their outcome.

      We thank the reviewer for this suggestion. We attempted to clone ORF141 into an expression plasmid and perform complementation experiments with Evo2 phage; however, all transformants that were isolated had premature stop-codons and frameshifts in the wild-type ORF141 insert that would disrupt protein function. We therefore think that the gene product of ORF141 might be toxic to the cells. We are currently working on placing the gene under more stringent transcriptional control but feel that these efforts fall outside of the scope of this paper.  

      Secondly, I also feel that authors looked for beta-lactam sensitivity and they found it. I am sure that if they look for rifampicin resistance in these strains, they will find that too. In this case, I cannot say that the evolution was directed to beta-lactam sensitivity; this is perhaps just one trait that was observed. This is the only weakness I find in the work. Nevertheless, I find the experiments convincing enough; more experiments only add value to the work.  

      We thank the reviewer for their comments. Because both phages and β-lactams interface with the bacterial cell wall, we posited that phage resistance would reduce resistance in cell wall targeting antibiotics. In our revisions, we have expanded our analysis to include a much wider range of antibiotic classes, including rifampicin, mupirocin, erythromycin, and other cell wall disruptors, such as daptomycin and teicoplanin. We did not observe any significant changes to the MICs of these other antibiotics (see Table S3 and lines 191-199). It therefore appears that the effects of these trade-offs are confined to beta-lactams.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      Thank you!

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized. Update: Modifications have been made throughout, which has made the manuscript easier to follow.

      Thank you!

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper is potentially important for us to understand how the immune cells respond differently to different severity level of injury. The study also demonstrated an imaging technology which may help us better understand cellular activity in living tissue during earlier time points.

      We agree that AOSLO has much to offer and this represents some of the earliest reports of its kind.  

      Comments on revisions:

      I appreciate the thorough clarification and re-organization by the authors, and the messages in the manuscript are now more apparent. I recommend also briefly discussing limitations/future improvements in the discussion or conclusion.

      We have added a section to the discussion entitled “Limitations and future improvements”, please see lines 665 – 677.

      Reviewer #3 (Public review):

      Summary

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation to the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network, constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience.

      Strengths

      Adaptive optics imaging of murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is a benefit for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. This model would potentially allow for controlling the size, depth and severity of the laser injury opening interesting avenues for future study.

      Thank you!

      The time-course, 2D and 3D spatial activation pattern of microglial activation are striking and provide an unprecedented view of the retinal response to mild injury.

      We agree that this more complete spatial and temporal evaluation made possible by in vivo imaging is novel.

      Weaknesses

      Generalization of the (lack of) neutrophil response to photoreceptor loss - there is ample evidence in literature that neutrophils are heavily recruited in response to severe retinal damage that includes photoreceptor loss. Why the same was not observed here in this article remains an open question. One could hypothesize that neutrophil recruitment might indeed occur under conditions that are more in line with the more extreme damage models, for example, with a stronger and global ablation (substantially more photoreceptor loss over a larger area). This parameter space is unwieldy and sufficiently large to address the question conclusively in the current article, i.e. how much photoreceptor loss leads to neutrophil recruitment? By the same token, the strong and general conclusion in the title - Photoreceptor loss does not recruit neutrophils - cannot be made until an exhaustive exploration be made of the same parameter space. A scaling back may help here, to reflect the specific, mild form of laser damage explored here, for instance - Mild photoreceptor loss does not recruit neutrophils despite...

      We are striving for clarity and accuracy in our title without adding too many qualifiers.  At present, we feel that the title as submitted is consistent and aligned with the central finding of our manuscript.  The nuance that the reviewer points to is elaborated in the body of the manuscript and we hope the general readership appreciates the same level of detail as appreciated by reviewer #3.

      EIU model - The EIU model was used as a positive control for neutrophil extravasation. Prior work with flow cytometry has shown a substantial increase in neutrophil counts in the EIU model. Yet, in all, the entire article shows exactly 2 examples in vivo and 3 ex vivo (Figure 7) of extravasated neutrophils from the EIU model (n = 2 mice). The general conclusion made about neutrophil recruitment (or lack thereof) is built partly upon this positive control experiment. But these limited examples, especially in the case where literature reports a preponderance of extravasated neutrophils, raise a question on the paradigm(s) used to evaluate this effect in the mild laser damage model.

      This is a helpful suggestion. We agree that readers should see more evidence of the positive control. Therefore we have now included two more supplementary files that show that there is a strong neutrophil response to EIU.  In Figure 7 – supplementary figure 1, we show many Ly-6G-positive neutrophils in the retina seen with histology at the 24 hour time point. In Figure 7 – video 3, we show massive Catchup-positive neutrophil presence in vivo at 24hrs as well.  This aligns with our positive control and also the literature.

      Overall, the strengths outweigh the weaknesses, provided the conclusions/interpretations are reconsidered.

      With the added clarification about the magnitude of the neutrophil response in EIU, we feel that the conclusions presented in the manuscript as-is are valid and appropriate.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors are applauded for embracing the reviewers' feedback and making substantial revisions. Some minor comments below:

      The weakness noted in the public review encourages the authors to reconsider the interpretations drawn based on the results. One would have expected to see far more examples of extravasated neutrophils from the EIU model. That this was not seen weakens the neutrophil recruitment claim substantially. Even without this claim, the methods, laser damage model, time-course and spatial activation pattern of microglial activation are all striking and unprecedented. So, as stated in the public review, the strengths do indeed outweigh the weaknesses once the neutrophil claim is softened.

      We address this in the response above. A strong neutrophil response was observed to EIU. This was confirmed with both histology and in vivo imaging.

      This was alluded to by Reviewer 1 in the prior review - at times, there is an overemphasis on imaging technology that distracts from the scientific questions. The imaging is undoubtedly cutting-edge but also documented in prior work by the authors. Any efforts to reduce or balance the emphasis would help with the general flow.

      Given that these discoveries are made possible partly through new technology, we prefer to keep the details of the innovation in the current manuscript. Given the exceptionally large readership of eLife, we feel some description of the AOSLO imaging is warranted in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors explored how galanin affects whole-brain activity in larval zebrafish using wide-field Ca2+ imaging, genetic modifications, and drugs that increase brain activity. The authors conclude that galanin has a sedative effect on the brain under normal conditions and during seizures, mainly through the galanin receptor 1a (galr1a). However, acute "stressors(?)" like pentylenetetrazole (PTZ) reduce galanin's effects, leading to increased brain activity and more seizures. The authors claim that galanin can reduce seizure severity while increasing seizure occurrence, speculated to occur through different receptor subtypes. This study confirms galanin's complex role in brain activity, supporting its potential impact on epilepsy.

      Strengths:

      The overall strength of the study lies primarily in its methodological approach using whole-brain Calcium imaging facilitated by the transparency of zebrafish larvae. Additionally, the use of transgenic zebrafish models is an advantage, as it enables genetic manipulations to investigate specific aspects of galanin signaling. This combination of advanced imaging and genetic tools allows for addressing galanin's role in regulating brain activity.

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      Indeed, our observation of the unexpected hypoactivity in EAAT2a mutants, described in our description of this mutant (Hotz et al., 2022), prompted us to initiate this study formulating the hypothesis that the observed upregulation of galanin is a neuroprotective response to epilepsy.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We have performed a transcriptome analysis that we are still evaluation. We can already state that AMPA receptor genes are not significantly altered in the mutant.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin, there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason for the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      We agree that upregulation of galanin transcripts is at best one of a suite of regulatory mechanisms that lead to hypoactivity in EAAT2 zebrafish mutants.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Reviewer #2 (Public Review):

      Summary:

      This study is an investigation of galanin and galanin receptor signaling on whole-brain activity in the context of recurrent seizure activity or under homeostatic basal conditions. The authors primarily use calcium imaging to observe whole-brain neuronal activity accompanied by galanin qPCR to determine how manipulations of galanin or the galr1a receptor affect the activity of the whole-brain under non-ictal or seizure event conditions. The authors' Eaat2a-/- model (introduced in their Glia 2022 paper, PMID 34716961) that shows recurrent seizure activity alongside suppression of neuronal activity and locomotion in the time periods lacking seizures is used in this paper in comparison to the well-known pentylenetetrazole (PTZ) pharmacological model of epilepsy in zebrafish. Given the literature cited in their Introduction, the authors reasonably hypothesize that galanin will exert a net inhibitory effect on brain activity in models of epilepsy and at homeostatic baseline, but were surprised to find that this hypothesis was only moderately supported in their Eaat2a-/- model. In contrast, under PTZ challenge, fish with galanin overexpression showed increased seizure number and reduced duration while fish with galanin KO showed reduced seizure number and increased duration. These results would have been greatly enriched by the inclusion of behavioral analyses of seizure activity and locomotion (similar to the authors' 2022 Glia paper and/or PMIDs 15730879, 24002024). In addition, the authors have not accounted for sex as a biological variable, though they did note that sex sorting zebrafish larvae precludes sex selection at the younger ages used. It would be helpful to include smaller experiments taken from pilot experiments in older, sex-balanced groups of the relevant zebrafish to increase confidence in the findings' robustness across sexes. A possible major caveat is that all of the various genetic manipulations are non-conditional as performed, meaning that developmental impacts of galanin overexpression or galanin or galr1a knockout on the observed results have not been controlled for and may have had a confounding influence on the authors' findings. Overall, this study is important and solid (yet limited), and carries clear value for understanding the multifaceted functions that neuronal galanin can have under homeostatic and disease conditions.

      Strengths:

      - The authors convincingly show that galanin is upregulated across multiple contexts that feature seizure activity or hyperexcitability in zebrafish, and appears to reduce neuronal activity overall, with key identified exceptions (PTZ model).

      - The authors use both genetic and pharmacological models to answer their question, and through this diverse approach, find serendipitous results that suggest novel underexplored functions of galanin and its receptors in basal and disease conditions. Their question is well-informed by the cited literature, though the authors should cite and consider their findings in the context of Mazarati et al., 1998 (PMID:982276). The authors' Discussion places their findings in context, allowing for multiple interpretations and suggesting some convincing explanations.

      - Sample sizes are robust and the methods used are well-characterized, with a few exceptions (as the paper is currently written).

      - Use of a glutamatergic signaling-based genetic model of epilepsy (Eaat2a-/-) is likely the most appropriate selection to test how galanin signaling can alter seizure activity, as galanin is known to reduce glutamatergic release as an inhibitory mechanism in rodent hippocampal neurons via GalR1a (alongside GIRK activation effects). Given that PTZ instead acts through GABAergic signaling pathways, it is reasonable and useful to note that their glutamate-based genetic model showed different effects than did their GABAergic-based model of seizure activity.

      Weaknesses:

      - The authors do not include behavioral assessments of seizure or locomotor activity that would be expected in this paper given their characterizations of their Eaat2a-/- model in the Glia 2022 paper that showed these behavioral data for this zebrafish model. These data would inform the reader of the behavioral phenotypes to expect under the various conditions and would likely further support the authors' findings if obtained and reported.<br />

      We agree that a thorough behavioral assessment would have strengthened the study, but we deemed it outside of the scope of this study.

      - No assessment of sex as a biological variable is included, though it is understood that these specific studied ages of the larvae may preclude sex sorting for experimental balancing as stated by the authors.

      The study was done on larval zebrafish (5 days post fertilization). The first signs of sexual differentiation become apparent at about 17 days post fertilization (reviewed in Ye and Chen, 2020). Hence sex is no biological variable at the stage studied. 

      - The reported results may have been influenced by the loss or overexpression of galanin or loss of galr1a during developmental stages. The authors did attempt to use the hsp70l system to overexpress galanin, but noted that the heat shock induction step led to reduced brain activity on its own (Supplementary Figure 1). Their hsp70l:gal model shows galanin overexpression anyways (8x fold) regardless of heat induction, so this model is still useful as a way to overexpress galanin, but it should be noted that this galanin overexpression is not restricted to post-developmental timepoints and is present during development.

      The developmental perspective is an important point to consider. Due to the rapid development of the zebrafish it is not trivial to untangle this. In the zebrafish we first observe epileptic seizures as early as 3 days post fertilization (dpf), where the brain is clearly not well developed yet (e.g. behaviroal response to light are still minimal). Even the 5 dpf stage, where most of our experiments have been conducted, cannot by far not be considered post-development.  

      Reviewer #3 (Public Review):

      Summary:

      The neuropeptide galanin is primarily expressed in the hypothalamus and has been shown to play critical roles in homeostatic functions such as arousal, sleep, stress, and brain disorders such as epilepsy. Previous work in rodents using galanin analogs and receptor-specific knockout has provided convincing evidence for the anti-convulsant effects of galanin.

      In the present study, the authors sought to determine the relationship between galanin expression and whole-brain activity. The authors took advantage of the transparent nature of larval zebrafish to perform whole-brain neural activity measurements via widefield calcium imaging. Two models of seizures were used (eaat2a-/- and pentylenetetrazol; PTZ). In the eaat2a-/- model, spontaneous seizures occur and the authors found that galanin transcript levels were significantly increased and associated with a reduced frequency of calcium events. Similarly, two hours after PTZ galanin transcript levels roughly doubled and the frequency and amplitude of calcium events were reduced. The authors also used a heat shock protein line (hsp70I:gal) where galanin transcript levels are induced by activation of heat shock protein, but this line also shows higher basal transcript levels of galanin. Again, the higher level of galanin in hsp70I:gal larval zebrafish resulted in a reduction of calcium events and a reduction in the amplitude of events. In contrast, galanin knockout (gal-/-) increased calcium activity, indicated by an increased number of calcium events, but a reduction in amplitude and duration. Knockout of the galanin receptor subtype galr1a via crispants also increased the frequency of calcium events.

      In subsequent experiments in eaat2a-/- mutants were crossed with hsp70I:gal or gal-/- to increase or decrease galanin expression, respectively. These experiments showed modest effects, with eaat2a-/- x gal-/- knockouts showing an increased normalized area under the curve and seizure amplitude.

      Lastly, the authors attempted to study the relationship between galanin and brain activity during a PTZ challenge. The hsp70I:gal larva showed an increased number of seizures and reduced seizure duration during PTZ. In contrast, gal-/- mutants showed an increased normalized area under the curve and a stark reduction in the number of detected seizures, a reduction in seizure amplitude, but an increase in seizure duration. The authors then ruled out the role of Galr1a in modulating this effect during PTZ, since the number of seizures was unaffected, whereas the amplitude and duration of seizures were increased.

      Strengths:

      (1) The gain- and loss-of function galanin manipulations provided convincing evidence that galanin influences brain activity (via calcium imaging) during interictal and/or seizure-free periods. In particular, the relationship between galanin transcript levels and brain activity in Figures 1 & 2 was convincing.

      (2) The authors use two models of epilepsy (eaat2a-/- and PTZ).

      (3) Focus on the galanin receptor subtype galr1a provided good evidence for the important role of this receptor in controlling brain activity during interictal and/or seizure-free periods.

      Weaknesses:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the manuscript currently lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We completely agree and concede that this study constitutes only a first attempt to understand the (at least for us) perplexing complexity of galanin function on the brain.

      (2) Calcium imaging is the primary data for the paper, but there are no representative time-series images or movies of GCaMP signal in the various mutants used.

      We have now added various movies in supplementary data.

      (3) For Figure 3, the authors suggest that hsp70I:gal x eaat2a-/-mutants would further increase galanin transcript levels, which were hypothesized to further reduce brain activity. However, the authors failed to measure galanin transcript levels in this cross to show that galanin is actually increased more than the eaat2a-/- mutant or the hsp70I:gal mutant alone.

      After a couple of unsuccessful mating attempts with our older mutants, we finally decided not to wait for a new generation to grow up, deeming the experiment not crucial (but still nice to have).

      (4) Similarly, transcript levels of galanin are not provided in Figure 2 for Gal-/- mutants and galr1a KOs. Transcript levels would help validate the knockout and any potential compensatory effects of subtype-specific knockout.

      To validate the gal-/- mutant line, we decided to show loss of protein expression (Suppl. Figure 2), which we deem to more relevant to argue for loss of function. Galanin transcript levels in galr1a KOs were also added into the same Figure. However, validation of the galr1a KO could not be performed due to transcript levels being close to the detection limit and lack of available antibodies.

      (5) The authors very heavily rely on calcium imaging of different mutant lines. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Again, we agree and concede that a number of additional approaches are needed to get more insight into the complex role of galanin in regulation overall brain activity. These include, among others, also behavioral, multiple single cell recordings and pharmacological interventions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor issues:

      (1) "Sedative" effect of galanin is somewhat vague and seems overapplied without the inclusion of behavioral data showing sedation effects. I would replace "sedative" with something clearer, like the phrase "net inhibitory effect" or similar.

      We have modified the wording as deemed appropriate.

      (2) Include new data that is sufficiently powered to detect or rule out the effects of sex as a biological variable within the various experiments.

      At this stage sex is not a biological variable. Sex determination starts a late larval stage around 14dpf. Our analysis is based on 5pdf larvae.

      (3) Attempt to perform some experiments with galanin/galr1a manipulations that have been induced after the majority of development without using heat shock induction if possible (unknown how feasible this is in current model systems).

      In the current model this is not feasible, but an excellent suggestion for future studies that would then also address more longterm effects in the model.

      (4) Figure 2 should include qPCR results for galanin or galr1a mRNA expression to match Figure 1C, F, and Figure 2C and to confirm reductions in the respective RNA transcript levels of gal or galr1a. It could be useful to perform qPCR for galanin in all galr1aKO mice to ascertain whether compensatory elevations in galanin occur in response to galr1aKO.

      (5) Axes should be made with bolder lines and bolder/larger fonts for readability and consistency throughout.

      Indeed, an excellent suggestion. We have adjusted the axes significantly improving the readability of the graphs.

      (6) The bottom o,f the image for Figure 2 appears to have been cut off by mistake (page 5).

      (7) The ending of the legend text for Figure 3 appears to have been cut off by mistake (page 6).

      Both regrettable mistakes have been corrected (already in the initial posted version)

      Reviewer #3 (Recommendations For The Authors):

      (1) The introduction or first paragraph of the results should be revised to more directly state the hypotheses. Several critical details were only clear after reading the discussion.

      We added some words to the introduction, hoping that the critical points are now more apparent to the reader.

      (2) Galanin is known to be rapidly depleted by seizures (Mazarati et al., 1998; Journal of Neuroscience, PMID #9822761) but this paper did not appear to be cited or considered. Could the rapid depletion of galanin during seizures help explain the confusing effects of galanin manipulations during PTZ?

      We have added a sentence and the reference to the discussion.

      (3) Figure 1 panels are incorrect. For example, Panel 'F' is used twice and the figure legend is also incorrect due to the labeling errors. In-text references to the figure should also be updated accordingly.

      (4) In Figure 2 N-P, the delta F/F threshold wording is partially cropped. The figure should be updated.

      Thank you for pointing out this mistake. Both figures have now been updated (already in the initial posted version)

      (5) The naming and labeling of groups in the manuscript and figures should be updated to more accurately reflect the fish used for each experiment. As it currently stands, I found the labeling confusing and sometimes misleading. For example, Figure 3 'controls' are actually eaat2a-/- mutants, whereas the other group is hsp70I:gal x eaat2a-/- crosses or gal-/- x eaat2a-/- crosses. In other Figures, 'controls' are eaat2a+/+larva, or wild-type siblings (sometimes unclear).

      We have made appropriate changes to the manuscript to make this point clearer to the reader, especially when the controls are eaat2a mutants.

      (6) Figure 4J and 4K only show 5 data points, when the authors clearly indicate that 6 fish had seizures. Continuation of this data in Figure 4L shows 6 data points.

      Indeed the 6 data points in Figure 4J and K are hard to see due to their nearly complete overlap. On larger magnification all six data points become distinguishable. We will try some different plotting approaches for the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Summary:

      Gene transfer agent (GTA) from Bartonella is a fascinating chimeric GTA that evolved from the domestication of two phages. Not much is known about how the expression of the BaGTA is regulated. In this manuscript, Korotaev et al noted the structural similarity between BrrG (a protein encoded by the ror locus of BaGTA) to a well-known transcriptional anti-termination factor, 21Q, from phage P21. This sparked the investigation into the possibility that BaGTA cluster is also regulated by anti-termination. Using a suite of cell biology, genetics, and genome-wide techniques (ChIP-seq), Korotaev et al convincingly showed that this is most likely the case. The findings offer the first insight into the regulation of GTA cluster (and GTA-mediated gene transfer) particularly in this pathogen Bartonella. Note that anti-termination is a well-known/studied mechanism of transcriptional control. Anti-termination is a very common mechanism for gene expression control of prophages, phages, bacterial gene clusters, and other GTAs, so in this sense, the impact of the findings in this study here is limited to Bartonella.

      Strengths:

      Convincing results that overall support the main claim of the manuscript.

      Weaknesses:

      A few important controls are missing.

      We sincerely appreciate reviewer #1's positive assessment of our manuscript. In response to the concern regarding control samples/experiments, we have addressed this issue in our revision, by providing data of the replicates of our experiments. We acknowledge that antitermination is a well-established mechanism of expression control in bacteria, including bacterial gene clusters, phages, prophages, and at least one other GTA. As reviewer #2 also noted, our study presents a unique example of phage co-domestication, where antitermination integrates both phage remnants at the regulatory level. We have emphasized this original aspect more clearly in the revised manuscript.

      Reviewer 1 (Recommendations for the authors):

      (1) Provide Rsmd and DALI scores to show how similar the AlphaFold-predicted structures of BrrG are to other anti-termination factors. This should be done for Fig1B and also for Suppl. Fig 1 to support the claim that BrrG, GafA, GafZ, Q21 share structural features.

      In the revised manuscript we provide Rsmd and DALI scores in the supplementary Fig. 1A (Suppl. Fig. 1A). In Suppl. Fig. 1B we further include a heatmap of similiarity values.

      (2) Throughout the manuscript, flow cytometry data of gfp expression was used and shown as single replicate. Korotaev et al wrote in the legends that error bars are shown (that is not true for e.g. Figs. 3, 4, and 5). It is difficult for reviewers/readers to gauge how reliable are their experiments.

      In the revised manuscript we show all replicates for the flow cytometry histograms.

      For Fig. 2C, all replicates are provided in Suppl. Fig. 3.

      For Fig. 3B, all replicates are provided in Suppl. Fig. 4.

      For Fig. 4B, all replicates are provided in Suppl. Fig. 5.

      For Fig. 5B, all replicates are provided in Suppl. Fig. 6.

      (3) I am unsure how ChIP-seq in Fig. 2A was performed (with anti-FLAG or anti-HA antibodies? I cannot tell from the Materials & Methods). More importantly, I did not see the control for this ChIP-seq experiment. If a FLAG-tagged BrrG was used for ChIP-seq, then a WT non-tagged version should be used as a negative control (not sequencing INPUT DNA), this is especially important for anti-terminator that can co-travel with RNA polymerase. Please also report the number of replicates for ChIP-seq experiments.

      Fig. 2A presents the coverage plot from the ChIP-Seq of ∆brrG +pPtet:3xFLAG-brrG (N’ in green). As anticipated by the referee, we had used ∆brrG +pTet:brrG (untagged) as control (grey). Each strain was tested in a single replicate. The C-terminal tag produced results similar to the untagged version, suggesting it is non-functional. All tested tags are shown in Supplementary Figure 2.

      (4) Korotaev et al mentioned that BrrG binds to DNA (as well as to RNA polymerase). With the availability of existing ChIP-seq data, the authors should be able to locate the DNA-binding element of BrrG, this additional information will be useful to the community.

      We identified a putative binding site of BrrG using our ChIP-Seq data. The putative binding site is indicated in Fig. 2D of the revised manuscript.

      (5) Mutational experiments to break the potential hairpin structure are required to strengthen the claim that this putative hairpin is the potential transcriptional terminator.

      We did not claim the identified hairpin is a confirmed terminator, but proposed it as a candidate. We agree with the referee that the suggested experiment would be necessary to definitively establish its function. However, our main objective was to show that BrrG acts as a processive terminator, which we demonstrated by replacing the putative terminator with a well-characterized synthetic one that BrrG successfully bypassed. Therefore, we chose not to perform the proposed experiment and have accordingly softened our conclusions regarding the hairpin’s potential terminator function.

      Reviewer 2 (Public review):

      Summary:

      In this study, the authors identified and characterized a regulatory mechanism based on transcriptional anti-termination that connects the two gene clusters, capsid and run-off replication (ROR) locus, of the bipartite Bartonella gene transfer agent (GTA). Among genes essential for GTA functionality identified in a previous transposon sequencing project, they found a potential antiterminatior of phage origin within the ROR locus. They employed fluorescence reporter and gene transfer assays of overexpression and knockout strains in combination with ChiPSeq and promoter-fusions to convincingly show that this protein indeed acts as an antiterminator counteracting attenuation of the capsid gene cluster expression.

      Impact on the field:

      The results provide valuable insights into the evolution of the chimeric BaGTA, a unique example of phage co-domestication by bacteria. A similar system found in the other broadly studied Rhodobacterales/Caulobacterales GTA family suggests that antitermination could be a general mechanism for GTA control.

      Strengths:

      Results of the selected and carefully designed experiments support the main conclusions.

      Weaknesses:

      It remains open why overexpression of the antiterminator does not increase the gene transfer frequency.

      We are grateful for reviewer #2's thoughtful and encouraging feedback on our manuscript. The reviewer raises an important question about why overexpression of the antiterminator does not increase gene transfer frequency. While we acknowledge this point, we consider it beyond the scope of the current study. Our findings clearly demonstrate that the antiterminator induces capsid component expression in a large proportion of cells. However, the fact that this expression plateaus at high levels rather than exhibiting a transient peak, as seen in the wild type, suggests that antiterminators do not regulate GTA particle release via lysis. We are actively investigating this further through additional experiments, which we plan to publish separately from this study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors wrote "GTAs are not self-transmitting because the DNA packaging capacity of a GTA particle is too small to package the entire gene cluster encoding it" (page 3). I thought that at least the Bartonella capsid gene cluster should be self-transmissible within the 14 kb packaged DNA (https://doi.org/10.1371/journal.pgen.1003393, https://doi.org/10.1371/journal.pgen.1000546). This was also concluded by Lang et al (https://doi.org/10.1146/annurev-virology-101416-041624). In this case the presented results would have important implications. As the gene cluster and the anti-terminator required for its expression are separated on the chromosome, it would not be possible to transfer an active GTA gene cluster, although the DNA coding for the genes required for making the packaging agent itself, theoretically fits into a BaGTA particle. Could the authors comment on that? I think it would be helpful to add the sizes of the different gene clusters and the distance between them in Fig. 2A. The ROR amplified region spans 500kb, is the capsid gene cluster within this region?

      We thank the reviewer for bringing up this interesting point. The ror gene cluster, which encodes the antiterminator BrrG, is approximately 9.2 kb in size and could feasibly be packaged in its entirety into a GTA particle. In contrast, the bgt cluster (capsid cluster) is approximately 20 kb in size —exceeding the packaging limit of GTA particles—and is separated from the bgt cluster by approximately 35 kb. Consequently, if the ror cluster is transferred via a GTA particle into a recipient host that does not encode the bgt gene cluster, the ror cluster would not be expressed.

      We added the sizes of the gene clusters to Fig. 1A.

      (2) Another side-note regarding the introduction: On page three the authors write: "GTAs encode bacteriophage-like particles and in contrast to phages transfer random pieces of host bacterial DNA". While packaging is not specific, certain biases in the packaging frequency are observed in both studied GTA families. For Bartonella this is ROR. In the two GTA-producing strains D. shibae and C. crescentus origin and terminus of replication are not packaged and certain regions are overrepresented (https://doi.org/10.1093/gbe/evy005, https://doi.org/10.1371/journal.pbio.3001790). Furthermore, D. shibae plasmids are not packaged but chromids are. I think the term "random" does not properly describe these observations. I would suggest using "not specific" instead.

      We thank the reviewer for this suggestion and adjusted the wording on p. 3 accordingly.

      (3) Page 5: Remove "To address this". It is not needed as you already state "To test this hypothesis" in the previous sentence.

      We adjusted the working on p.5 accordingly.

      (4) I think the manuscript would greatly benefit from a summary figure to visualize the Q-like antiterminator-dependent regulatory circuit for GTA control and its four components described on pages 15 and 16.

      We thank the reviewer for this valuable suggestion. We included a summary figure (Fig. 6) in the discussion section of the revised manuscript.

      (5) Page 17: It might be worth noting that GafA is highly conserved along GTAs in Rhodobacterales (https://doi.org/10.3389/fmicb.2021.662907) and so is probably regulatory integration into the ctrA network (https://doi.org/10.3389/fmicb.2019.00803). It's an old mechanism. It would be also interesting to know if it is a common feature of the two archetypical GTAs that the regulator is not part of the cluster itself.

      We agree with the reviewer’s comments and have revised the wording to state that GafA is highly conserved.

    1. Author response:

      The following is the authors’ response to the previous reviews

      General Response to Reviewers:

      We thank the Reviewers for their comments, which continue to substantially improve the quality and clarity of the manuscript, and therefore help us to strengthen its message while acknowledging alternative explanations.

      All three reviewers raised the concern that we have not proven that Rab3A is acting on a presynaptic mechanism to increase mEPSC amplitude after TTX treatment of mouse cortical cultures.  The reviewers’ main point is that we have not shown a lack of upregulation of postsynaptic receptors in mouse cortical cultures. We want to stress that we agree that postsynaptic receptors are upregulated after activity block in neuronal cultures.  However, the reviewers are not acknowledging that we have previously presented strong evidence at the mammalian NMJ that there is no increase in AChR after activity blockade, and therefore the requirement for Rab3A in the homeostatic increase in quantal amplitude points to a presynaptic contribution. We agree that we should restrict our firmest conclusions to the data in the current study, but in the Discussion we are proposing interpretations. We have added the following new text:

      “The impetus for our current study was two previous studies in which we examined homeostatic regulation of quantal amplitude at the NMJ.  An advantage of studying the NMJ is that synaptic ACh receptors are easily identified with fluorescently labeled alpha-bungarotoxin, which allows for very accurate quantification of postsynaptic receptor density. We were able to detect a known change due to mixing 2 colors of alpha-BTX to within 1% (Wang et al., 2005).  Using this model synapse, we showed that there was no increase in synaptic AChRs after TTX treatment, whereas miniature endplate current increased 35% (Wang et al., 2005). We further showed that the presynaptic protein Rab3A was necessary for full upregulation of mEPC amplitude (Wang et al., 2011). These data strongly suggested Rab3A contributed to homeostatic upregulation of quantal amplitude via a presynaptic mechanism.  With the current study showing that Rab3A is required for the homeostatic increase in mEPSC amplitude in cortical cultures, one interpretation is that in both situations, Rab3A is required for an increase in the presynaptic quantum.”

      The point we are making is that the current manuscript is an extension of that work and interpretation of our findings regarding the variability of upregulation of postsynaptic receptors in our mouse cortical cultures further supports the idea that there is a Rab3Adependent presynaptic contribution to homeostatic increases in quantal amplitude.

      Public Reviews:

      Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.

      We have not removed the statement, but altered it to soften the conclusion. It now reads, “We found that the increase in postsynaptic AMPAR levels in wild type cultures was more variable than that of mEPSC amplitudes, which might be explained by a presynaptic contribution, but we cannot rule out variability in the measurement.”.

      Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

      This particular statement is in the previous response to reviewers only, we deleted the sentence that starts, “The simplest explanation Rab3A regulates a presynaptic contributor….”. and “Imaging of immunofluorescence more variable…”. We deleted “ our data suggest….consistently leads to an increase in mEPSC amplitude and sometimes leads to….” We added “…the lack of a robust increase in receptor levels leaves open the possibility that there is a presynaptic contributor to quantal size in mouse cortical cultures. However, the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is wellsupported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      The main body of this comment is addressed in the General Response to Reviewers. In addition, we deleted text “current data, coupled with our previous findings at the mouse neuromuscular junction, support the idea that there are additional sources contributing to the homeostatic increase in quantal size.” We added new text, so the sentence now reads: “Increased receptors likely contribute to increases in mESPC amplitudes in mouse cortical cultures, but because we do not have a significant increase in GluA2 receptors in our experiments, it is impossible to conclude that the increase is lacking in cultures from Rab3A<sup>-/-</sup> neurons.”

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture?

      Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. 

      A major goal of performing the immunofluorescence measurements in the same cultures for which we had electrophysiological results was to address the common impression that the homeostatic effect itself is highly variable, as the reviewer notes in the comment “…GluA2 increases are variable across cultures…” Presumably, if GluA2 increases are the mechanism of the mEPSC amplitude increases, then variable GluA2 increases should correlate with variable mEPSC amplitude increases, but that is not what we observed. We are left with the explanation that the immunofluorescence method itself is very variable. We have added the point to the Discussion, which reads, “the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent homeostatic plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Finally, the implication of “Shouldn’t this be an indicator that something unusual has happened in this culture?” if it is not due to culture to culture variability in the homeostatic response itself, is that there was a technical problem with accurately measuring receptor levels. We have no reason to suspect anything was amiss in this set of coverslips (the values for controls and for TTX-treated were not outside the range of values in other experiments). In any of the coverslips, there may be variability in the amount of primary anti-GluA2 antibody, as this was added directly to the culture rather than prepared as a diluted solution and added to all the coverslips. But to remove this one experiment because it did not give the expected result is to allow bias to direct our data selection.

      The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size.”

      We have the same situation; our effect sizes match (19.7% increase for mEPSC amplitude; 18.1% increase for GluA2 receptor cluster size, see Table 1), but in our case, the p value for receptors does not reach statistical significance. Our point here is that there is published evidence that the variability in receptor measurements is greater than the variability in electrophysiological measurements. But we have softened this point, removing the sentences containing “…consistently leads and sometimes...” and “……additional sources contributing…”.

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      We have removed all uses of the word “mismatch,” but feel the presentation of the 3 matched experiments, 23-24 cells (Figure 5A, D), and the additional 2 experiments for a total of 5 cultures, 48-49 cells (Figure 5C, F), is important in order to demonstrate that the lack of statistically significant receptor response is due neither to a variable homeostatic response in the mEPSC amplitudes, nor to a small number of cultures.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffic to synapses if possible at all).

      We were unable to identify a key reference that characterized GluA2 homomers vs. heteromers in native cortical neurons, but we have rewritten the section in the manuscript to acknowledge the low conductance of homomers:

      “…to assess whether GluA2 receptor expression, which will identify GluA2 homomers and GluA2 heteromers (the former unlikely to contribute to mEPSCs given their low conductance relative to heteromers (Swanson et al., 1997; Mansour et al., 2001)…”

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      Figure 3 in Turrigiano et al., shows that the increase in glutamate responsiveness is higher at the cell body than along the primary dendrite. We have revised our description to indicate that an increase in responsiveness on the primary dendrite has been demonstrated in Turrigiano et al. 1998.

      “We focused on the primary dendrite of pyramidal neurons as a way to reduce variability that might arise from being at widely ranging distances from the cell body, or, from inadvertently sampling dendritic regions arising from inhibitory neurons. In addition, it has been shown that there is a clear increase in response to glutamate in this region (Turrigiano et al., 1998).”

      “…synaptic receptors on the primary dendrite, where a clear increase in sensitivity to exogenously applied glutamate was demonstrated (see Figure 3 in (Turrigiano et al., 1998)).

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

      We have modified the paragraph in the Discussion that addresses the glial results, to describe more clearly the data that supported an astrocytic TNF-alpha mechanism: “TNF-alpha accumulates after activity blockade, and directly applied to neuronal cultures, can cause an increase in GluA1 receptors, providing a potential mechanism by which activity blockade leads to the homeostatic upregulation of postsynaptic receptors (Beattie et al., 2002; Stellwagen et al., 2005; Stellwagen and Malenka, 2006).”

      We have also acknowledged that we cannot rule out TNF-alpha coming from neurons in the cortical cultures: “…suggesting the possibility that neuronal Rab3A can act via a non-TNF-alpha mechanism to contribute to homeostatic regulation of quantal amplitude, although we have not ruled out a neuronal Rab3A-mediated TNF-alpha pathway in cortical cultures.”

      Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Regarding the concern that the lack of increase in receptors is due to a technical issue, please see General Response to Reviewers, above. We have also softened our conclusions throughout, acknowledging we cannot rule out a technical issue.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      The key finding in Figure 3 is that NASPM did not eliminate the statistically significant increase in mEPSC amplitude after TTX treatment (Fig 3A).  Whether or not NASPM sensitive receptors contribute to mESPC amplitude is a separate question (Fig 3B). We are open to the possibility that NASPM reduces mEPSC amplitude in both control and TTX treated cells (p = 0.08 for both), but that does not change our conclusion that NASPM has no effect on the TTX-induced increase in mEPSC amplitude. The mechanism underlying the decrease in mEPSC frequency following NASPM is interesting, but does not alter our conclusions regarding the role of Rab3A in homeostatic synaptic plasticity of mEPSC amplitude. In addition, the Reviewer does not acknowledge the Supplemental Figure #1, which shows a similar lack of correspondence between homeostatic increases in mEPSC amplitude and GluA1 receptors in two cultures where matched data were obtained. Therefore, we do not think our lack of a robust increase in receptors can be explained by our failing to look at the relevant receptor.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      We agree that definitive evidence for a presynaptic role of Rab3A in homeostatic plasticity of mEPSC amplitudes in mouse cortical cultures requires demonstrating that loss of Rab3A in postsynaptic neurons does not disrupt the plasticity, whereas loss in presynaptic neurons does. Without these data, we can only speculate that the Rab3A-dependence of homeostatic plasticity of quantal size in cortical neurons may be similar to that of the neuromuscular junction, where it cannot be receptors. We have added to the Discussion that the mechanism of Rab3A regulation of homeostatic plasticity of quantal amplitude could different between cortical neurons and the neuromuscular junction (lines 448-450 in markup,). Establishing a way to co-culture Rab3A-/- and Rab3A+/+ neurons in ratios that would allow us to record from a Rab3A-/- neuron that has mainly Rab3A+/+ inputs (or vice versa) is not impossible, but requires either transfection or transgenic expression with markers that identify the relevant genotype, and will be the subject of future experiments.

      (2): Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

      We agree with the Reviewer, that it is important to determine the generality of Rab3A function in homeostatic plasticity. Establishing the homeostatic effect on mIPSCs and then examining them in Rab3A-/- cultures is a large undertaking and will be the subject of future experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor (remaining) points:

      (1) The figure referenced in the first response to the reviewers (Figure 5G) does not exist.

      We meant Figure 5F, which has been corrected in the current response.

      (2) I recommend showing the data without binning (despite some overlap).

      The box plot in Origin will not allow not binning, but we can make the bin size so small that for all intents and purposes, there is close to 1 sample in each bin. When we do this, the majority of data are overlapped in a straight vertical line. Previously described concerns were regarding the gaps in the data, but it should be noted that these are cell means and we are not depicting the distributions of mEPSC amplitudes within a recording or across multiple recordings.

      (3) Please auto-scale all axes from 0 (e.g., Fig 1E, F).

      We have rescaled all mEPSC amplitude axes in box plots to go from 0 (Figures 1, 2 and 6).

      (4) Typo in Figure legend 3: "NASPM (20 um)" => uM

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 140, frequencies are reported in Hz while other places are in sec-1, while these are essentially the same, they should be kept consistent in writing.

      All mEPSC frequencies have been changed to sec<sup>-1</sup>, except we have left “Hz” for repetitive stimulation and filtering.

      (2) Paragraph starting from line 163 (as well as other places where multiple groups are compared, such as the occlusion discussion), the authors assessed whether there was a change in baseline between WT and mutant group by doing pairwise tests, this is not the right test. A two-way ANOVA, or at least a multivariant test would be more appropriate.

      We have performed a two-way ANOVA, with genotype as one factor, and treatment as the other factor. The p values in Figures 1 and 2 have been revised to reflect p values from the post-hoc Tukey test on the specific interactions (for each particular genotype, TTX vs CON effects). The difference in the two WT strains, untreated, was not significant in the Post-Hoc Tukey test, and we have revised the text. The difference between the untreated WT from the Rab3A+/Ebd colony and the untreated Rab3AEbd/Ebd mutant was still significant in the Post-Hoc Tukey test, and this has replaced the Kruskal-Wallis test. The two-way ANOVA was also applied to the neuron-glia experiments and p values in Figure 6 adjusted accordingly.

      (3) Relevant to the second point under minor concerns, I suggest this sentence be removed, as reducing variability and avoiding inhibitory projects are reasons good enough to restrict the analysis to the apical dendrites.

      We have revised the description of the Turrigiano et al., 1998 finding from their Figure 3 and feel it still strengthens the justification for choosing to analyze only synapses on the apical dendrite.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The comments on lines 256-7 could seem misleading - the NASPM results wouldn't rule out contribution of those other subunits, only non-GluA2 containing combinations of those subunits. I would suggest revising this statement. Also, NASPM does likely have an effect, just not one that changes much with TTX treatment.

      At new line 213 (markup) we have added the modifier “homomeric” to clarify our point that the lack of NASPM effect on the increase in mEPSC amplitude after TTX indicates that the increase is not due to more homomeric Ca<sup>2+</sup>-permeable receptors. We have always stated that NASPM reduces mEPSC amplitude, but it is in both control and treated cultures.

      Strong conclusions based on a single culture (lines 314-5) seem unwarranted.

      We have softened this statement with a “suggesting that” substituted for the previous “Therefore,” but stand by our point that the mEPSC amplitude data support a homeostatic effect of TTX in Culture #3, so the lack of increase in GluA2 cluster size needs an explanation other than variability in the homeostatic effect itself.

      Saying (line 554) something is 'the only remaining possibility' also seems unwarranted.

      We have softened this statement to read, “A remaining possibility…”.

      Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC (2002) Control of synaptic strength by glial TNFalpha. Science 295:2282-2285.

      Mansour M, Nagarajan N, Nehring RB, Clements JD, Rosenmund C (2001) Heteromeric AMPA receptors assemble with a preferred subunit stoichiometry and spatial arrangement. Neuron 32:841-853. Stellwagen D, Malenka RC (2006) Synaptic scaling mediated by glial TNF-alpha. Nature 440:1054-1059.

      Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Wang X, Li Y, Engisch KL, Nakanishi ST, Dodson SE, Miller GW, Cope TC, Pinter MJ, Rich MM (2005) Activity-dependent presynaptic regulation of quantal size at the mammalian neuromuscular junction in vivo. J Neurosci 25:343-351.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate these comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 2-figure supplement 4), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thanks for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries” (see all quantification data in our manuscript). Notably, this phenotype ranges from mild to severe. In normal nurse cells, nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, nurse cell nuclei become disorganized and begin to condense and fragment (see the third panel in Figure 2-figure supplement 2E). In late-stage death, the nuclei are completely fragmented into small, condensed spherical structures (see the second panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. To improve clarity, we will provide a detailed description of the phenotype and integrate this explanation into the main text of the revised manuscript.

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for our misleading description. In Figure 5, we aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors (Figure 4), suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 131-133), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We will revise the manuscript to prevent misinterpretation. Furthermore, we will investigate whether depletion of UBE2V1, UBE2V2, or both promotes oncogenic KRAS-driven overgrowth in human cancer cells.

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate this critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 2C, 2D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. To further test our model, we will compare CycA expression levels in normal nurse cells versus those undergoing oncogenic Ras<sup>G12V</sup>-induced cell death.

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thanks for this suggestion. We will include this information in the introduction of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from Ras<sup>G12V</sup>-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of Ras<sup>G12V</sup>-overexpression in diploid germline cells, where Ras<sup>G12V</sup>-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate these comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thanks for highlighting this issue. We will provide requested details regarding these quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

    1. Author response:

      We sincerely thank the reviewers and editors for their thoughtful and constructive evaluation of our manuscript and their recognition of its technical strengths, including advanced spatio-temporal Ca2+ imaging, image processing, and the rational design of selective AVP receptor ligands. We appreciate their acknowledgement that our study contributes to the understanding of glucose-dependent AVP effects in pancreatic islet physiology. Their comments will guide us to refine the scope of our work, which focuses on how α and β cells respond to AVP under varying glucose and hormonal conditions, rather than on linear correlations between the function and transcript levels in individual cells or metabolic profiles in individual cell. Most of the reviewers´ concerns and proposed remedies reflect a reductionist framework, for which we believe cannot not fully account for emergent behavior within the islet collective. As we and others have shown, islet cells do not behave in isolation; their responses often depend on the state of the entire cell population(1, 2). This means that even under identical experimental conditions, responses can differ depending on the islet’s current state. These patterns are not random, but reflect how the islet integrates signals dynamically(3, 4).

      To take advantage of both the systems and molecular side, we do plan to address several of the reviewers' suggestions with new experiments and analyses:

      First, we will add hormone, specifically glucagon, secretion assays to support our conclusions on α cell responses and possible paracrine effects. Second, we will include a targeted transcript analysis of V1bR using RNAscope and extend the pharmacological characterization of downstream signaling using selective agonists and inhibitors. Third, we will clarify the rationale for using forskolin, and added new experiments using GLP-1 analogues to selectively increase cAMP in β cells, allowing us to examine direct AVP effects. And fourth, we will reinforce presence of emergency and that variability in islet responses is not experimental noise, but a hallmark of the collective, non-linear behavior of the islet cell collective, which should later drive a rethinking of experimental designs and the interpretation of pharmacological responses. In conclusion, we believe that our study provides new insights into AVP modulation in pancreatic islets and highlights the importance of context-dependent responses in α and β cells. We are grateful for the opportunity to revise our manuscript and look forward to further strengthening it further through the review process.

      (1) Jin E, Briggs JK, Benninger RKP, Merrins MJ. Glucokinase activity controls peripherally-located subpopulations of β-cells that lead islet Ca2+ oscillations. eLife Sciences Publications, Ltd; 2025.

      (2) Korošak D, Jusup M, Podobnik B, Stožer A, Dolenšek J, Holme P, et al. Autopoietic Influence Hierarchies in Pancreatic β Cells. Phys Rev Lett. 2021;127(16):168101.

      (3) Ball P. How life works : a user's guide to the new biology. Chicago: The University of Chicago Press; 2023. 541 pages p.

      (4) Fancher S, Mugler A. Fundamental Limits to Collective Concentration Sensing in Cell Populations. Phys Rev Lett. 2017;118(7):078101.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates prey capture by archer fish, showing that even though the visuomotor behavior unfolds very rapidly (within 40-70 ms), it is not hardwired; it can adapt to different simulated physics and different prey shapes. Although there was agreement that the model system, experimental design, and main hypothesis are certainly interesting, opinions were divided on whether the evidence supporting the central claims is incomplete. A more rigorous definition and assessment of "reflex speed", more detailed evidence of stimulus control, and a more detailed analysis of individual subjects could potentially increase confidence in the main conclusions.

      Thank you very much. There are several points that we had to absolutely make sure that they are very well understood. (1) Explaining in the best possible way the experiment with a fly sliding on top of a glass plate. Here, the virtual ballistic landing point can be calculated using simple high school physics. It turns out that this is where the fish turn to – even though the fly is not falling at all. Once this is understood it becomes clear that we can precisely measure latency and accuracy of the C-start turns. In the new version we explain this essential aspect in more detail and add an extra Figure (new Figure 2). This may, perhaps, help readers to notice this important background (previously covered in Fig. 1C). (2) The full experimental evidence that the VR method works is presented in more detail and all measurements necessary will be clear after the new Figure 2. They will however not be clear if this Figure is ignored. (3) We have rewritten the manuscript to make it easier to understand what we wanted to show, why we needed VR to proceed and why the archerfish highspeed decision lent itself so readily to tackle the problem. (4) We emphasize the importance of speed-accuracy tradeoffs in standard decision-making and also include data on the absence of such a relation in the archerfish highspeed decisions.

      So, in summary, we have emphasized what we wanted to show and what we did not want to show, we have rewritten the text to make it easier for future readers and we have tried to add more guidance to the figures. We do hope very much that the beauty of the quite unexpected findings is more easily visible to those who take the trouble of actually reading the paper.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test whether the archerfish can modulate the fast response to a falling target. By manipulating the trajectory of the target, they claim that the fish can modulate the fast response. While it is clear from the result that the fish can modulate the fast response, the experimental support for the argument that the fish can do it for a reflex-like behavior is inadequate.

      Please note that we have not simply tested whether archerfish can 'modulate the fast response'. We quantitatively test specific hypotheses on the rules used by the fish. For this the accuracy of the decisions is analyzed with respect to specific points that can be calculated precisely in each of the experiments. These points are shown on the figures and in the movies that were meant to illustrate this important aspect. We had to make sure that the way we calculate the predicted point(s) is made as clear as possible in the text. We added more text and separated the fundamentally important aspects in a separate Figure 2 to make it more difficult to overlook the fundamental aspects that lay the foundation for everything that follows.

      Strengths:

      Overall, the question that the authors raised in the manuscript is interesting.

      Thank you and we do hope very much that, with our revision, you will see the beauty of the findings.

      Weaknesses:

      (1) The argument that the fish can modulate reflex-like behavior relies on the claim that the archerfish makes the decision in 40 ms. There is little support for the 40 ms reaction time. The reaction time for the same behavior in Schlegel 2008, is 6070 ms, and in Tsvilling 2012 about 75 ms, if we take the half height of the maximum as the estimated reaction time in both cases. If we take the peak (or average) of the distribution as an estimation of reaction time, the reaction time is even longer. This number is critical for the analysis the authors perform since if the reaction time is longer, maybe this is not a reflex as claimed. In addition, mentioning the 40 ms in the abstract is overselling the result. The title is also not supported by the results.

      Although the minimum latency is indeed 40 ms (it can be slightly less: e.g., see the evidence in the paper, for instance the plots in the new Fig. 4) the paper's statements are not dependent on a specific number. Even if minimum latency was 100 ms (which it is not) the speed of the response and the absence of a speedaccuracy relation (now shown directly in Fig. 4) is what is of importance. To show this we have completely rewritten large parts of the manuscript.

      (2) A critical technical issue of the stimulus delivery is not clear. The frame rate is 120 FPS and the target horizontal speed can be up to 1.775 m/s. This produces a target jumping on the screen 15 mm in each frame. This is not a continuous motion. Thus, the similarity between the natural system where the target experiences ballistic trajectory and the experiment here is not clear. Ideally, another type of stimulus delivery system is needed for a project of this kind that requires fast-moving targets (e.g. Reiser, J. Neurosci.Meth. 2008). In addition, the screen is rectangular and not circular, so in some directions, the target vanishes earlier than others. It must produce a bias in the fish response but there is no analysis of this type.

      Please note that the new Fig. 3 (former Fig. 2) reports all the evidence that is needed to just show this and in a way that could in no way have been better. We have rewritten the text to explain what needs to be shown experimentally in order to be able to proceed, what critical tests were done and what results were obtained. We also add a short comment on another unsuccessful attempt that we have tried before.

      (3) The results here rely on the ability to measure the error of response in the case of a virtual experiment. It is not clear how this is done since the virtual target does not fall. How do the authors validate that the fish indeed perceives the virtual target as the falling target? Since the deflection is at a later stage of the virtual trajectory, it is not clear what is the actual physics that governs the world of the experiment. Overall, the experimental setup is not well designed.

      Understanding this aspect is essential. If the glass plate experiment is not thoroughly understood (new Fig. 2 with new text to emphasize that this is absolutely essential) nothing that follows makes any sense, including what is meant by the statement that the decision could be hardwired to ballistic motion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript studies prey capture by archer fish, which observe the initial values of motion of aerial prey they made fall by spitting on them, and then rapidly turn to reach the ballistic landing point on the water surface. The question raised by the article is whether this incredibly fast decision-making process is hardwired and thus unmodifiable or can be adjusted by experience to follow a new rule, namely that the landing point is deflected from a certain amount of the expected ballistic landing point. The results show that the fish learn the new rule and use it afterward in a variety of novel situations that include height, side, and speed of the prey, and which preserve the speed of the fish's decision. Moreover, a remarkable finding presented in this work is the fact that fish that have learned to use the new rule can relearn to use the ballistic landing point for an object based on its shape (a triangle) while keeping simultaneously the 'deflected rule' for an object differing in shape (a disc); in other words, fish can master simultaneously two decisionmaking rules based on the different shape of objects.

      Strengths:

      The manuscript relies on a sophisticated and clever experimental design that allows changing the apparent landing point of a virtual prey using a virtual reality system. Several robust controls are provided to demonstrate the reliability and usefulness of the experimental setup.

      Overall, I very much like the idea conveyed by the authors that even stimuli triggering apparently hardwired responses can be relearned in order to be associated with a different response, thus showing the impressive flexibility of circuits that are sometimes considered mediating pure reflexive responses.

      Thank you so much for this precise assessment of what we have shown!

      This is the case - as an additional example - of the main component of the Nasanov pheromone of bees (geraniol), which triggers immediate reflexive attraction and appetitive responses, and which can, nevertheless, be learned by bees in association with an electric shock so that bees end up exhibiting avoidance and the aversive response of sting extension to this odorant (1), which is a fully unnatural situation, and which shows that associative aversive learning is strong enough to override preprogrammed responding, thus reflecting an impressive behavioral flexibility.

      That's very interesting, thanks and we are very happy to mention this important study in the revised version.

      Weaknesses:

      As a general remark, there is some information that I missed and that is mandatory in the analysis of behavioral changes.

      Firstly, the variability in the performances displayed. The authors mentioned that the results reported come from 6 fish (which is a low sample size). How were the individual performances in terms of consistency? Were all fish equally good in adjusting/learning the new rule? How did errors vary according to individual identity? It seems to me that this kind of information should be available as the authors reported that individual fish could be recognized and tracked (see lines 620-635) and is essential for appreciating the flexibility of the system under study.

      Secondly, the speed of the learning process is not properly explained. Admittedly, fish learn in an impressive way the new rule and even two rules simultaneously; yet, how long did they need to achieve this? In the article, Figure 2 mentions that at least 6 training stages (each defined as a block of 60 evaluated turn decisions, which actually shows that the standard term 'Training Block' would be more appropriate) were required for the fish to learn the 'deflected rule'. While this means 360 trials (turning starts), I was left with the question of how long this process lasted. How many hours, days, and weeks were needed for the fish to learn? And as mentioned above, were all fish equally fast in learning? I would appreciate explaining this very important point because learning dynamics is relevant to understanding the flexibility of the system.

      First, it is very important to keep the question in mind that we wanted to clarify: Does the system have the potential to re-tune the decisions to other non-ballistic relations between the input variables and the output? This would have been established if one fish was found capable of doing that. We have rewritten the introduction and discussion to specifically say what our aim was. We feel that the paper is already extremely long and difficult to understand (even after we tried very hard in this revision to explain everything in detail and as good as we could), requires the establishment of a method whose success was really unexpected and finding a degree of plasticity that we did not expect at all. We also have added a section in the discussion stating what we can, and we cannot say given the number of fish examined. For instance, we do not know if there are differences in the speed at which the different individuals mastered the new rules and if social learning could play a role to speed up the acquisition. That is a brilliant idea and we are very interested in checking this - but we wanted to stick with the (quite ambitious) goal of the present study.

      Reference:

      (1) Roussel, E., Padie, S. & Giurfa, M. Aversive learning overcomes appetitive innate responding in honeybees. Anim Cogn 15, 135-141, doi:10.1007/s10071011-0426-1 (2012).

      Thanks for this reference!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) What is the difference between Reinel, J. Exp. Bio. 2016 and the current study?

      Clearly in that study all objects were strictly falling ballistically, and latency and accuracy of the turn decisions were determined when the initial motion was not only horizontal but had an additional vertical component of speed. The question of that study was if the need to account to an additional variable (vertical speed) in the decision would affect its latency or accuracy. The study showed that also then archerfish rapidly turn to the later impact point. It also showed that accuracy and latency were not changed by the added degree of freedom.

      (2) How do Figures 2 F and G demonstrate that an accurate start is possible?

      See above.

      (3) Figure 4 is hard to follow, it is not clear what is presented and how it supports the claim that the new rule is represented in a way that allows immediate generalization.

      Yes, this is not at all an easy experiment. Briefly, fish were re-trained at only one height level and then are tested at other levels. The strategy is as in the experiments Schuster et al. 2004 Current Biology, Vol. 14, 1565–1568, Figure 5. We have changed text and Figure (new Figure 5) to show how the predictions were reached.

      Reviewer #2 (Recommendations for the authors):

      Minor remarks

      Lines 88-90: I was surprised to see that in this section, the authors did not mention the speed-accuracy trade-off off which has inspired numerous experiments in animal behavior (1). This could be used to back their point, namely, that speed comes with an apparent cost of a loss in accuracy.

      Yes, that is a crucial aspect that was completely missing even though it demonstrates a key aspect of 'standard' versus some 'highspeed' decisions! We definitely had to include it and also to show, directly under the conditions of our present experiments (in the new Fig. 4) the absence of a significant speedaccuracy relation for the archerfish highspeed decisions! Thank you very much for emphasizing this crucial aspect!

      Lines 182-184: Specify that this situation corresponds to the hatched bar in Figure (this can be specified in the caption of the figure, where the bar is not mentioned).

      Thanks!

      Lines 187-188: here and elsewhere (e.g. lines 224-225, etc), the error made by the fish is presented in cm (see Figure 2 where the inset shows how the error was computed). I wonder if it would not be more appropriate to present it in terms of the angular difference between the trajectory made by the fish and the food delivery location.

      Angles could also be used, but because of the large variation in initial distances (that we wanted to make sure that the fish had to capture a rule, allowing them to respond from various distances) another measure was used that we found somehow more natural: it is simply how close a fish would get to the landing point if it continued in the direction assumed after the turn. Although we describe how we defined accuracy we did not discuss why this measure was used in this (and many previous studies). We are very happy to add this. Please also note that running all tests based on angular errors (which we also have done throughout to ensure that the conclusions are independent on an arbitrary measure of the error) leads to no different conclusion. We have added a brief explanation in the text and in the new Fig. 2.

      Lines 299-323: Is it my impression or did fish have more trouble in generalizing their learned rule to the condition untrained larger height (see for instance red curves in Figures 4 D, E, G)? Could the authors elaborate on this point?

      We changed the code to make this more clear. The red curves (before marked A to highlight impact point option A) correspond to the errors to the ballistic impact point without deflection, so what would have to be compared are the black curves (marked P to highlight the virtual impact point that should be chosen had the fish immediately generated to the untrained conditions). We have rewritten the text and the labels in the Figure (now Figure 5) to illustrate the predictions and to name them in more helpful ways and so that they can't be confused with panel labels. At any rate, what needs to be compared, to check the idea, are the black curves, and these are not statistically different between both heights (p=0.525, Mann-Whitney). Interestingly, none of the black curves from all panels (D-G) differ (p>0.3).

      Line 559: if we are speaking here about luminance contrast, it should read 'Michelson Contrast' rather than 'Michelsen Contrast'.

      Absolutely, thanks!

      References

      (1) Chittka, L., Skorupski, P. & Raine, N. E. Speed-accuracy tradeoffs in animal decision making. Trends Ecol Evol 24, 400-407, doi:10.1016/j.tree.2009.02.010 (2009).

      An excellent paper that helps to stress our main question

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Comments on revisions:

      The authors have addressed most of the previous comments. However, they should clarify the following response:

      *"For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect."*

      It has been reported that the anterior part of the cerebellum may have a lower regenerative capacity compared to the posterior lobe. To avoid potential ambiguity, the authors should clarify that "the phenotype" and "prominent defect" refer to more severe EGL depletion at an earlier stage after IR rather than a poorer regenerative outcome. Additionally, they should provide a reference to support their statement or indicate if it is based on unpublished observations.

      Our comment does not refer to a more severe EGL depletion at an earlier stage. There is instead poorer regeneration of the anterior region. The irradiation approach used provides consistent cell killing of GCPs across the cerebellum. This can be seen in Fig. 1c, e, g, i in our previous publication: Wojcinski, et al. (2017) Cerebellar granule cell replenishment post-injury by adaptive reprogramming of Nestin+ progenitors. Nature Neuroscience, 20:1361-1370). Also, Fig 2e, g, k, m in the paper shows that by P5 and P8, posterior lobule 8 recovers better than anterior lobules 1-5.

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models, including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case. Additional experiments and tools are required to assess mechanisms. Regardless, the data still implicate microglia in the neonatal regenerative response, and this finding remains an important advance.

      As stated previously, the suggested follow up experiments while relevant are extensive and considered beyond the scope of the current paper.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation, and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Weaknesses:

      (1) The diversity of cell types recovered from scRNA-seq libraries of sorted Nes-CFP cells is unexpected, especially the inclusion of minor types such as microglia, meninges, and ependymal cells. The authors should validate whether Nes and CFP mRNAs are enriched in the sorted cells; if not, they should discuss the potential pitfalls in sampling bias or artifacts that may have affected the dataset, impacting interpretation.

      In our previous work, we thoroughly assessed the transgene using RNA in situ hybridization for Cfp, immunofluorescent analysis for CFP and scRNA-seq analysis for Cfp transcripts (Bayin et al., Science Adv. 2021, Fig. S1-2)(1), and characterized the diversity within the NEP populations of the cerebellum. Our present scRNA-seq data also confirms that Nes transcripts are expressed in all the NEP subtypes. A feature plot for Nes expression has been added to the revised manuscript (Fig 1E), as well as a sentence explaining the results. Of note, since this data was generated from FACS-isolated CFP+ cells, the perdurance of the protein allows for the detection of immediate progeny of Nes-expressing cells, even in cells where Nes is not expressed once cells are differentiated. Finally, oligodendrocyte progenitors, perivascular cells, some rare microglia and ependymal cells have been demonstrated to express Nes in the central nervous system; therefore, detecting small groups of these cells is expected (2-4). We have added the following sentence (lines 391-394):

      “Detection of Nes mRNA confirmed that the transgene reflects endogenous Nes expression in progenitors of many lineages, and also that the perdurance of CFP protein in immediate progeny of Nes-expressing cells allowed the isolation of these cells by FACS (Figure 1E)”.

      (2) The authors should de-emphasize that ROS signaling and related gene upregulation exclusively in gliogenic NEPs. Genes such as Cdkn1a, Phlda3, Ass1, and Bax are identified as differentially expressed in neurogenic NEPs and granule cell progenitors (GCPs), with Ass1 absent in GCPs. According to Table S4, gene ontology (GO) terms related to ROS metabolic processes are also enriched in gliogenic NEPs, neurogenic NEPs, and GCPs.

      As the reviewer requested, we have de-emphasized that ROS signaling is preferentially upregulated in gliogenic NEPs, since we agree with the reviewer that there is some evidence for similar transcriptional signatures in neurogenic NEPs and GCPs. We added the following (lines 429-531):

      “Some of the DNA damage and apoptosis related genes that were upregulated in IR gliogenic-NEPs (Cdkn1a, Phlda3, Bax) were also upregulated in the IR neurogenic-NEPs and GCPs at P2 (Supplementary Figure 2B-E).”

      And we edited the last few sentences of the section to state (lines 453-459):

      “Interestingly, we did not observe significant enrichment for GO terms associated with cellular stress response in the GCPs that survived the irradiation compared to controls, despite significant enrichment for ROS signaling related GO-terms (Table S4). Collectively, these results indicate that injury induces significant and overlapping transcriptional changes in NEPs and GCPs. The gliogenic- and neurogenic-NEP subtypes transiently upregulate stress response genes upon GCP death, and an overall increase in ROS signaling is observed in the injured cerebella.”

      (3) The authors need to justify the selection of only the anterior lobe for EGL replenishment and microglia quantification.

      We thank the reviewers for asking for this clarification. Our previous publications on regeneration of the EGL by NEPs have all involved quantification of these lobules, thus we think it is important to stay with the same lobules. For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      “First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect.”

      (4) Figure 1K: The figure presents linkages between genes and GO terms as a network but does not depict a gene network. The terminology should be corrected accordingly.

      We have corrected the terminology and added the following (lines 487-489):

      “Finally, linkages between the genes in differentially open regions identified by ATAC-seq and the associated GO-terms revealed an active transcriptional network involved in regulating cell death and apoptosis (Figure 1K).”

      (5) Figure 1H and S2: The x-axis appears to display raw p-values rather than log10(p.value) as indicated. The x-axis should ideally show -log10(p.adjust), beginning at zero. The current format may misleadingly suggest that the ROS GO term has the lowest p-values.

      Apologies for the mistake. The data represents raw p-values and the x-axis has been corrected.

      (6) Genes such as Ppara, Egln3, Foxo3, Jun, and Nos1ap were identified by bulk ATAC-seq based on proximity to peaks, not by scRNA-seq. Without additional expression data, caution is needed when presenting these genes as direct evidence of ROS involvement in NEPs.

      We modified the text to discuss the discrepancies between the analyses. While some of this could be due to the lower detection limits in the scRNA-seq, it also highlights that chromatin accessibility is not a direct readout for expression levels and further analysis is needed. Nevertheless, both scRNA-seq and ATAC-seq have identified similar mechanisms, and our mutant analysis confirmed our hypothesis that an increase in ROS levels underlies repair, further increasing the confidence in our analyses. Further investigation is needed to understand the downstream mechanisms. We added the following sentence (lines 478-481):

      “However, not all genes in the accessible areas were differentially expressed in the scRNA-seq data. While some of this could be due to the detection limits of scRNA-seq, further analysis is required to assess the mechanisms of how the differentially accessible chromatin affects transcription.”

      (7) The authors should annotate cell identities for the different clusters in Table S2.

      All cell types have been annotated in Table S2.

      (8) Reiterative clustering analysis reveals distinct subpopulations among gliogenic and neurogenic NEPs. Could the authors clarify the identities of these subclusters? Can we distinguish the gliogenic NEPs in the Bergmann glia layer from those in the white matter?

      Thank you for this clarification. As shown in our previous studies, we can not distinguish between the gliogenic NEPs in the Bergmann glia layer and the white matter based on scRNA-seq, but expression of the Bergmann glia marker Gdf10 suggests that a large proportion of the cells in the Hopx+ clusters are in the Bergmann glia layer. The distinction within the major subpopulations that we characterized (Hopx-, Ascl1-expressing NEPs and GCPs) are driven by their proliferative/maturation status as we previously observed. We have included a detailed annotation of all the clusters in Table S2, as requested and a UMAP for mKi57 expression in Fig 1E. We have clarified this in the following sentence (lines 383-385):

      “These groups of cells were further subdivided into molecularly distinct clusters based on marker genes and their cell cycle profiles or developmental stages (Figure 1D, Table S2).”

      (9) In the Methods section, the authors mention filtering out genes with fewer than 10 counts. They should specify if these genes were used as background for enrichment analysis. Background gene selection is critical, as it influences the functional enrichment of gene sets in the list.

      As requested, the approach used has been added to the Methods section of the revised paper. Briefly, the background genes used by the goseq function are the same genes used for the probability weight function (nullp). The mm8 genome annotation was used in the nullp function, and all annotated genes were used as background genes to compute GO term enrichment. The following was added (lines 307-308):

      “The background genes used to compute the GO term enrichment includes all genes with gene symbol annotations within mm8.”

      (10) Figure S1C: The authors could consider using bar plots to better illustrate cell composition differences across conditions and replicates.

      As suggested, we have included bar plots in Fig. S1D-F.

      (11) Figures 4-6: It remains unclear how the white matter microglia contribute to the recruitment of BgL-NEPs to the EGL, as the mCAT-mediated microglia loss data are all confined to the white matter.

      We have thought about the question and had initially quantified the microglia in the white matter and the rest of the lobules (excluding the EGL) separately. However, there are very few microglia outside the white matter in each section, thus it is not possible to obtain reliable statistical data on such a small population. We therefore did not include the cells in the analysis. We have added this point in the main text (line 548).

      “As a possible explanation for how white matter microglia could influence NEP behaviors, given the small size of the lobules and how the cytoarchitecture is disrupted after injury, we think it is possible that secreted factors from the white matter microglia could reach the BgL NEPs. Alternatively, there could be a relay system through an intermediate cell type closer to the microglia.” We have added these ideas to the Discussion of the revised paper (lines 735-738).

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case, but there is no explanation for why further, longer treatment was not attempted nor was there any additional analyses of other regenerative steps in the treated animals. The data still implicate microglia in the neonatal regenerative response, but how remains uncertain.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is an exemplary manuscript.

      The methods and data are very well described and presented.

      I actually have very little to ask the authors except for an explanation of why PLX treatment was discontinued after P5 or P8 and what other steps of NEP reprogramming were assessed in these animals? Was NEP expansion still decreased at P8 even in the presence of PLX at this stage? Also - was there any analysis attempted combining mCAT and PLX?

      We agree with the reviewer that a follow up study that goes into a deeper analysis of the role of microglia in GCP regeneration and any interaction with ROS signaling would interesting. However, it would require a set of tools that we do not currently have. We did not have enough PLX5622 to perform addition experiments or extend the length of treatment. Plexxikon informed us in 2021 that they were no longer manufacturing PLX5622 because they were focusing on new analogs for in vivo use, and thus we had to use what we had left over from a completed preclinical cancer study. We nevertheless think it is important to publish our preliminary results to spark further experiments by other groups.

      References

      (1) Bayin N. S. Mizrak D., Stephen N. D., Lao Z., Sims P. A., Joyner A. L. Injury induced ASCL1 expression orchestrates a transitory cell state required for repair of the neonatal cerebellum. Sci Adv. 2021;7(50):eabj1598.

      (2) Cawsey T, Duflou J, Weickert CS, Gorrie CA. Nestin-Positive Ependymal Cells Are Increased in the Human Spinal Cord after Traumatic Central Nervous System Injury. J Neurotrauma. 2015;32(18):1393-402.

      (3) Gallo V, Armstrong RC. Developmental and growth factor-induced regulation of nestin in oligodendrocyte lineage cells. The Journal of neuroscience : the official journal of the Society for Neuroscience. 1995;15(1 Pt 1):394-406.

      (4) Huang Y, Xu Z, Xiong S, Sun F, Qin G, Hu G, et al. Repopulated microglia are solely derived from the proliferation of residual microglia after acute depletion. Nat Neurosci. 2018;21(4):530-40.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affects the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as means to delay the rise of resistance.

      In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8h treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8 h treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.

      The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistances towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotic tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.

      The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive accumulation of mutations that can confer resistance towards different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.

      Strengths:

      A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.

      In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.

      Weaknesses:

      Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.

      Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.

      Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Comments on revisions:

      Thank you for responding to the concerns raised previously. The manuscript overall has improved.

      We sincerely thank the reviewer for raising this important point. In our initial submission, we acknowledge that our mutation analysis was based on a limited number of replicates (n=6), which may not have been sufficient to robustly distinguish between mutation induction and selection. In response to this concern, we have substantially expanded our experimental dataset. Specifically, we redesigned the mutation rate validation experiment by increasing the number of biological replicates in each condition to 96 independent parallel cultures. This enabled us to systematically assess mutation frequency distributions under four conditions (WT, WT+ampicillin, ΔrecA, ΔrecA+ampicillin), using both maximum likelihood estimation (MLE) and distribution-based fluctuation analysis (new Figure 1F, 1G, and Figure S5).

      These expanded datasets revealed that:

      (1) While the estimated mutation rate was significantly elevated in ΔrecA+ampicillin compared to ΔrecA alone (Fig. 1G),

      (2) The distribution of mutation frequencies in ΔrecA+ampicillin was highly skewed with evident jackpot cultures (Fig. 1F), and

      (3) The observed pattern significantly deviated from Poisson expectations, which is inconsistent with uniform mutagenesis and instead supports clonal selection from an early-arising mutational pool (Fig. S5).

      Importantly, these new results do not contradict our original conclusions but rather extend and refine them. The previous evidence for ROS-mediated mutagenesis remains valid and is supported by our GSH experiments, transcriptomic analysis of oxidative stress genes, and DNA repair pathway repression. However, the additional data now indicate that ROS-induced variants are not uniformly induced after antibiotic exposure but are instead generated stochastically under the stress-prone ΔrecA background and then selectively enriched upon ampicillin treatment.

      Taken together, we now propose a two-step model of resistance evolution in ΔrecA cells (new Figure 5):

      Step i: RecA deficiency creates a hypermutable state through impaired repair and elevated ROS, increasing the probability of resistance-conferring mutations.

      Step ii: β-lactam exposure acts as a selective bottleneck, enriching early-arising mutants that confer resistance.

      We have revised both the Results and Discussion sections to clearly articulate this complementary relationship between mutational supply and selection, and we believe this integrated model better explains the observed phenotypes and mechanistic outcomes.

      Reviewer #2 (Public review):

      This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. The authors employed a modified Adaptive Laboratory Evolution (ALE) workflow to investigate this, initiating the process by diluting an overnight culture 50-fold into an ampicillin selection medium. They present evidence that a recA- strain develops ampicillin resistance mutations more rapidly than the wild-type, as indicated by the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of recA- colonies resistant to ampicillin showed predominant inactivation of genes involved in the multi-drug efflux pump system, contrasting with wild-type mutations that seem to activate the chromosomal ampC cryptic promoter. Further analysis of mutants, including a lexA3 mutant incapable of inducing the SOS response, led the authors to conclude that the rapid evolution of antibiotic resistance occurs via an SOS-independent mechanism in the absence of recA. RNA sequencing suggests that antioxidative response genes drive the rapid evolution of antibiotic resistance in the recA- strain. They assert that rapid evolution is facilitated by compromised DNA repair, transcriptional repression of antioxidative stress genes, and excessive ROS accumulation.

      Strengths:

      The experiments are well-executed and the data appear reliable. It is evident that the inactivation of recA promotes faster evolutionary responses, although the exact mechanisms driving this acceleration remain elusive and deserve further investigation.

      Weaknesses:

      Some conclusions are overstated. For instance, the conclusion regarding the LexA3 allele, indicating that rapid evolution occurs in an SOS-independent manner (line 217), contradicts the introductory statement that attributes evolution to compromised DNA repair.

      We thank the reviewer for this insightful observation, which highlights a central conceptual advance of our study. Our data indeed indicate that resistance evolution in ΔrecA occurs independently of canonical SOS induction (as shown by the lack of resistance in lexA3, dpiBA, and translesion polymerase mutants), yet is clearly associated with impaired DNA repair capacity (e.g., downregulation of polA, mutH, mutY).

      This apparent “contradiction” reflects the dual role of RecA: it functions both as the master activator of the SOS response and as a key factor in SOS-independent repair processes. Thus, the rapid resistance evolution in ΔrecA is not due to loss of SOS, but rather due to the broader suppression of DNA repair pathways that RecA coordinates, which elevates mutational load under stress (This point is discussed in further detail in our response to Reviewer 1).

      The claim made in the discussion of Figure 3 that the hindrance of DNA repair in recA- is crucial for rapid evolution is at best suggestive, not demonstrative. Additionally, the interpretation of the PolI data implies its role, yet it remains speculative.

      We appreciate this comment and would like to respectfully clarify that our conclusion regarding the role of DNA repair impairment is supported by several independent lines of mechanistic evidence.

      First, our RNA-seq analysis revealed transcriptional suppression of multiple DNA repair genes in ΔrecA cells following ampicillin treatment, including polA (DNA Pol I) and the base excision repair genes mutH, mutY, and mutM (Fig. 4K). This indicates that multiple repair pathways, including those responsible for correcting oxidative DNA lesions, are downregulated under these conditions.

      Second, we observed a significant reduction in DNA Pol I protein expression as well as reduced colocalization with chromosomal DNA in ΔrecA cells, suggesting impaired engagement of repair machinery (Fig. 3C-E). These phenotypes are not limited to transcriptional signatures but extend to functional protein localization.

      Third, and most importantly, resistance evolution was fully suppressed in ΔrecA cells upon co-treatment with glutathione (GSH), which reduces ROS levels. As GSH did not affect ampicillin killing (Fig. 4J), these findings suggest that mutagenesis and thus the emergence of resistance requires both ROS accumulation and the absence of efficient repair.

      Therefore, we believe these data go beyond correlation and demonstrate a mechanistic role for DNA repair impairment in driving stress-associated resistance evolution in ΔrecA. We have revised the Discussion to emphasize the strength of this evidence while avoiding overstatement.

      In Figure 2A table, mutations in amp promoters are leading to amino acid changes.

      We thank the reviewer for spotting this inconsistency. Indeed, the ampC promoter mutations we identified reside in non-coding regulatory regions and do not result in amino acid substitutions. We have corrected the annotation in Fig. 2A and clarified in the main text that these mutations likely affect gene expression through transcriptional regulation, rather than protein sequence alteration.

      The authors' assertion that ampicillin significantly influences persistence pathways in the wild-type strain, affecting quorum sensing, flagellar assembly, biofilm formation, and bacterial chemotaxis, lacks empirical validation.

      We thank the reviewer for pointing this out. In the original version, we acknowledged transcriptional enrichment of genes related to quorum sensing, flagellar assembly, and chemotaxis in the wild-type strain upon ampicillin treatment. However, as we did not directly assess persistence phenotypes (e.g., biofilm formation or persister levels), we agree that such functional inferences were not fully supported. We have revised the relevant statements to focus solely on transcriptomic changes and have removed language suggesting direct effects on persistence pathways.

      Figure 1G suggests that recA cells treated with ampicillin exhibit a strong mutator phenotype; however, it remains unclear if this can be linked to the mutations identified in Figure 2's sequencing analysis.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Reviewer #3 (Public review):

      In the present work, Zhang et al investigate involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term (transient) drug resistance evolution can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by accumulation of reactive oxygen species and compromised DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance).

      Strengths:

      The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. Antibiotic resistance evolution under transient stress is poorly studied, so the authors' work is a nice mechanistic contribution to this field.

      Weaknesses:

      The authors do not show any direct evidence of altered mutation rate or accumulated DNA damage in their model.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest two minor changes to the text.

      (1) Re. WGS data.

      The authors write in their response "We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      I think the source of my confusion stemmed from this part in the text:

      "In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations developed over time to provide high levels of resistance (29). To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we..."

      I would change the phrase "verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain" to "identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin." This would explicitly state what the sequencing was for (ie. ID-ing mutations). The current phrase can give the impression that WGS was used to validate rapid or high mutagenesis.

      Thanks for this suggestion. We have revised this description to “In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations that can arise stochastically and, under stress conditions, become enriched through selection over time to confer high levels of resistance (33). Having observed a non-random and right-skewed distribution of mutation frequencies in ΔrecA isolates following ampicillin exposure, we next sought to determine whether specific resistance-conferring mutations were enriched in ΔrecA isolates following antibiotic exposure.”

      (2) Re. whether the mutations are "induced" or "pre-existing."

      The authors write:

      "We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics."

      I think it is important to discuss the negative data for the other antibiotics (along with the other points made in your Reviewer response) in the main text.

      This point is discussed in further detail in our response to Reviewer 1 (Public Review).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) A cartoon paradigm of the HFD treatment window would be a helpful addition to Figure 1. Relatedly, the authors might consider qualifying MHFD as 'lactational MHFD.' Readers might miss the fact that the exposure window starts at birth.

      This is a good suggestion. The MHFD-L model has been used previously (e.g. Vogt et al. 2014). We have included a cartoon of the MHFD-L model and the PLX treatments to Figure 4, which we feel helps the readers and thank the reviewer for the suggestion.

      (2) More details on the modeling pipeline are needed either in Figure 1 or text. Of the ~50 microglia that were counted (based on Figure 1J), were all 50 quantified for the morphological assessments? Were equal numbers used for the control and MHFD groups? Were the 3D models adjusted manually for accuracy? How much background was detected by IMARIS that was discarded? Was the user blind to the treatment group while using the pipeline? Were the microglia clustered or equally spread across the PVN?

      In response to this suggestion, we have expanded the description of the image analysis routine in the methods. The analysis focused on detailed changes in microglial morphology as opposed to overall changes in microglia throughout the PVH as a whole. Accordingly, we applied anatomically matched ROIs to the PVH for the measurements. As described in the methods, the Imaris Filaments tool was used to visualize microglia fully contained within a tissue section and a mask derived from the 3D model for these cells was used to isolate them for further analysis, thereby separating these cells from interstitial labeling corresponding to parts of cell processes or other labeling not associated with selected cells. There was no formal “background subtraction.” This was an error in the previous version of the manuscript and we have revised the methods to reflect the process actually used. The images were segmented (to enhance signal to noise for 3D rendering), and then a Gaussian filter was applied to improve edge detection, which facilitates the morphological measurements.

      (3) Suggest toning back some of the language. For example: "...consistent with enhanced activity and surveillance of their immediate microenvironment" (Line 195) could be "...perhaps consistent with...". Likewise, "profound" (Lines 194, 377) might be an overstatement.

      Revisions have been made to both the Introduction and Discussion to modulate our representation of this controversial issue.

      (4) Representative images for AgRP+ cells (quantified in Figure 2J) are missing. Why not a co-label of Iba1+/AgRP+ as per Figure 1, 3? Also, what was quantified in Figure 2J - soma? Total immunoreactivity?

      Because the density of AgRP labeling does not change in the ARH we omitted the red channel image (AgRP labeling) to highlight the similarity of the microglial morphology. To address the reviewer’s concerns, in the revised figure we have reconstituted the figure with both the green (microglial) and red (AgRP) channels depicted.

      Figure 2J displays the numbers of AgRP neurons counted in the ARH in selected R01s through the ARH. The Methods section has been revised to include the visualization procedure used for the cell counts.

      (5) For the PLX experiment:

      a) "...we depleted microglia during the lactation period" (Line 234). This statement suggests microglia decreased from the first injection at P4 and throughout lactation, which is inaccurate. PLX5622 effects take time, upwards of a week. Thus, if PLX5622 injections started at P4, it could be P11 before the decrease in microglia numbers is stable. Moreover, by the time microglia are entirely knocked down, the pups might be supplementing some chow for milk, making it unclear how much PLX5622 they were receiving from the dam, which could also impact the rate at which microglia repopulation commences in the fetal brain. Quantifying microglia across the P4-P21 treatment window would be helpful, especially at P16, since the PVN AgRP microglia phenotypes were demonstrated and roughly when pups might start eating some chow. b) I am surprised that ~70% of the microglia are present at P21. Does this number reflect that microglia are returning as the pups no longer receive PLX5622 from milk from the dam? Does it reflect the poor elimination of microglia in the first place?

      This is an important point and have revised the first sentence in section 2.3 to clarify the PLX treatment logic and added a cartoon to Fig. 4 to show the treatment timeline. The PLX5622 was not administered to the dams but daily to the pups. We also agree with the interpretation that PLX5622 depleted numbers of microglia, as supported by the microglial cell counts, rather than effected a complete elimination and have made revisions to clarify this distinction. Although mice were weighed at weaning, cellular measurements were only made in mice perfused at P55.

      (6) Was microglia morphology examined for all microglia across the PVN? It is possible that a focus on PVNmpd microglia would reveal a stronger phenotype? In Figure 4H, J, AgRP+ terminals are counted in PVN subregions - PVNmpd and PVNpml, with PVNmpd showing a decrease of ~300 AgRP+ terminals in MHFD/Veh (rescued in MHFD/PLX5622). In Figure 1K, AgRP+ terminals across what appears to be the entire PVN decrease by ~300, suggesting that PVNmpd is driving this phenotype. If true, then do microglia within the PVNmpd display this morphology phenotype?

      We have revised the description of the analysis procedures to clarify these points. All measurements were made in user defined, matched regions of interest according to morphological features of the PVH. No measurements were made that included the entire PVH and we revised the Methods section to improve clarity.

      (7) What chow did the pups receive as they started to consume solid food? Is this only a MHFD challenge, or could the pups be consuming HFD chow that fell into the cage?

      The pups were weaned onto the same normal chow diet that the dams received prior to MHFD-L treatment. The cages were inspected daily and minimal HFD spillage was observed, although we cannot rule out with certainty any contribution of the pups directly consuming the HFD. We have edited Methods section 5.2 for clarity.

      (8) Figure 5: Does internalized AgRP+ co-localize with CD68+ lysosomes? How was 'internalized' determined?

      This important point has been clarified by revisions to the Methods section.

      (9) Different sample sizes are used across experiments (e.g., Figure 4 NCD n=5, MHFD n=4). Does this impact statistical significance?

      Sample size does impact power of ANOVA with larger samples reducing the chance of errors. ANOVA is generally robust in the face of moderate departures from the assumption of equal sample sizes and equal variance such as we experienced in the PLX5622 experiment. Here we used t-tests to detect differences in a single variable between two groups and two-way ANOVA to compare treatment by diet and treatment changes in the PLX5622 studies. Additional detail has been added to the Methods section to clarify this point.

      Reviewer #2 (Public reviews):

      (1) Under chow-fed conditions, there is a decrease in the number of microglia in the PVH and ARH between P16 and P30, accompanied by an increase in complexity/volume. With the exception of PVH microglia at P16, this maturation process is not affected by MHFD. This "transient" increase in microglial complexity could also reflect premature maturation of the circuit.

      This is an interesting possibility that requires future investigation (see response to Recommended Suggestions, above).

      (2) The key experiment in this paper, the ablation of microglia, was presumably designed to prevent microglial expansion/activation in the PVH of MHFD pups. However, it also likely accelerates and exaggerates the decrease in cell number during normal development regardless of maternal diet. Efforts to interpret these findings are further complicated because microglial and AgRP neuronal phenotypes were not assessed at earlier time points when the circuit is most sensitive to maternal influences.

      We agree that evaluations of microglia and hypothalamic circuits at many more time points would indeed be informative (see comments above).

      (3) Microglial loss was induced broadly in the forebrain. Enhanced AgRP outgrowth to the PVH could be caused by actions elsewhere, such as direct effects on AgRP neurons in the ARH or secondary effects of changes in growth rates.

      A local effect of microglia in the ARH that affects growth of AgRP axons remains a distinct possibility that deserves a targeted examination (see response to Recommended Suggestions, above).

      (4) Prior publications from the authors and other groups support the idea that the density of AgRP projections to the PVH is primarily driven by factors regulating outgrowth and not pruning. The failure to observe increased engulfment of AgRP fibers by PVH microglia is therefore not surprising. The possibility that synaptic connectivity is modulated by microglia was not explored.

      Synaptic pruning and regulation of axon targeting are not mutually exclusive processes and microglia may participate in both. Here we evaluated innervation of the PVH, which is sensitive to MHFD-L exposure, and engulfment of AgRP terminals by microglia, which does appear to be altered by MHFD-L. Given previous observations of terminal engulfment by microglia in other brain regions in response to environmental changes (e.g. prolonged stress) it is not unreasonable to expect this outcome in the offspring of MHFD-L dams.  In future work it will be important to profile multiple cell types in the PVH for microglial dependent and MHFDL-sensitive changes in targeting of AgRP axons. Equally important is a full characterization of postsynaptic changes in PVH neurons.

      Reviewer #3 (Public reviews):

      There was no attempt to interrogate microglia in different parts of the hypothalamus functionally. Morphology alone does not reflect a potential for significant signaling alterations that may occur within and between these and other cell types.

      The authors should discuss the limitations of their approach and findings and propose future directions to address them.

      We agree that evaluations of microglia and hypothalamic circuits at many more time points that include analyses of multiple regions would indeed be informative. We have added statements to the manuscript that address the limitations of our experimental approach and suggest future studies that will extend understanding of underlying mechanisms beyond those investigated here.

      Recommendations for the authors:

      Reviewing Editors Comments:

      (1) The Abstract is 405 words and should be shortened to less than 200 words.  

      The abstract has been edited to 200 words.

      (2) The authors might consider raising the question in the Introduction of whether reduced AgRP innervation of the PVN in MHFD-treated mice is due to decreased axonal growth, enhanced microglial-mediated pruning, or a combination of both. The potential effects on axonal growth should be given more consideration.

      This is an important point that we agree deserves additional consideration in the manuscript. Our past work has focused on leptin’s ability to influence axonal targeting of PVH neurons by AgRP and PPG neurons through a cell-autonomous mechanism and our conclusion is that leptin primarily induces axon growth. Because in this study our design did not focus on changes in axon growth over time but on regional changes in microglia and their interactions with AgRP terminals we did not want to divert attention from our logic in the introduction by highlighting multiple mechanisms. However, we have added a brief mention in the Introduction and have expanded consideration of axonal growth effects to the Discussion. Distinguishing between microglia’s role in synaptic density or axon targeting in this pathway is an important goal of future work.

      (3) Line 37, a high-fat diet should be defined here as HFD and used consistently thereafter. Note that "high-fat-diet exposure" requires two hyphens.

      The suggested revisions have been made throughout the manuscript.

      (4) Line 38 and elsewhere, MHFD does not adequately describe the treatment being limited to the lactation period, perhaps MLHFD would be better or just LHFD (because the pups can't lactate).

      The suggested revisions have been made throughout the manuscript, and we have used MHFD-L to describe maternal consumption of a high-fat diet that is restricted to the lactation period.

      (5) Line 110, leptin-deficient mice (add hyphen).

      (6) Line 183, NCD should be defined.

      The suggested revisions have been made throughout the manuscript.

      (7) Lines 237- 238, it is not clear what is widespread in the rostral forebrain. Is it the loss of microglia? What is the dividing point between the rostral and caudal forebrain? Were microglia depleted in the caudal forebrain too?

      We have revised this section of the manuscript to focus the description on the hypothalamus alone and specify that the reduction in microglial density is not restricted to the PVH.  

      (8) Line 245, microglial-mediated effects (add hyphen).

      (9) Line 247, vehicle-treated mice (add hyphen).

      The suggested revisions have been made throughout the manuscript.

      (10) Line 457, when referring to genes, the approved gene name should be used in italics, AgRP should be Agrp (italics).

      The suggested revision has been made throughout the manuscript.

      (11) Line 459, the name of the Syn-Tom mice in the Key Resource table, Methods, and Text should be consistent. It would be best to use the formal name of the Ai34 line of mice on the JAX website.

      The suggested revisions have been made throughout the manuscript.

      (12) Figure 1G H, and I um should have Greek micro; Fig. 1J and K, Replace # with Number. The same suggestions apply to all the other figures.

      Both the manuscript and figures have been revised in accordance with this recommendation.

      (13) Figures 4 G, H, I and J. and Figures 5 M and O. The font size is too small to see well.

      Fonts have been changed in the figures to improve visibility.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures are out of order in the text. For example, Figure 1A is followed next by the results for Figure 1J instead of Figure 1B.

      We regret that the organization of figure panels makes for awkward matching for the reader as they proceed through the text. We designed the figures to facilitate comparisons between cellular responses and differences in labeling. After evaluating a reorganization, we decided to maintain the original panel configurations, but have revised the text to more closely follow the presentation of cellular features in the figures.

      (2) Figure 1B.: All images lack scale bars.

      (3) Line 433 - 'underlie' is spelled wrong.

      (4) Rosin et al should be 2019 and not 2018.

      These corrections have been implemented in the revised text and figures.

      (5) The statement that "the effects of MHFD on microglial morphology in the PVH of offspring display both temporal and regional specificity, which correspond to a decrease in the density of AgRP inputs to the PVH" (Line 196) needs clarification, as the phrase "regional specificity" has not been substantiated in this section even though it is discussed later.

      We agree with this comment and have revised section 2.1 to more closely match the data presented to this point in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The claim of "spatial specificity" in the effects of MHFD on microglia is based on an increase in the complexity/volume of microglia at P16 in the PVH that was not seen in the ARH or BNST. The transient nature of the effect raises several questions: Does the effect on the PVH represent premature maturation?

      This is an interesting suggestion. However, given how little is known about microglial maturation in the hypothalamus it is difficult to address. It is indeed possible that microglia mature at different rates in each AgRP target, and that MHFD-L exposure alters the rate of maturation in some regions but not others. This will require a great deal more analysis of both microglia and ARH projections to understand fully (see below).

      (2) To support their central claim that microglia in the PVH "sculpt the density of AgRP inputs to the PVH" the authors report effects on Iba1+ cells in the PVH of chow-fed dams at P55, body weight at P21, and AgRP projections in the PVH at an unspecified age. It is hard to understand what is happening across "normal" development in chow-fed dams since the number of Iba1+ cells decreases from ~50 to ~25 between P16 and P30 (Figure 1), but then increases to >60 cells at P55 (Figure 4). Given the large fluctuations in microglial population across time, analyzing the same parameters (i.e. microglial number/morphology in the ARH and PVH, AgRP neuronal number in the ARH, and fiber density in the PVH, and body weight) across time points before, during and after the critical period in chow and MHFD conditions would be very helpful.

      The time points we evaluated were chosen to be during and after the previously determined critical period for development of AgRP projections to the PVH, which were then compared with adults (which were all P55) to assess longevity of the effects. We have incorporated revisions to improve the clarity of when measurements were assessed, and treatments implemented. Defining the cellular dynamics of microglia across time remains a major challenge for the field and will certainly be informed by future studies with additional time points, as well as by in vivo imaging studies focused on regions identified here. Although such studies are beyond the scope of the present work, their completion would advance our current understanding of how microglia respond to nutritional changes during development of feeding circuits.

      (3) As microglia are also ablated in the ARH, direct effects on AgRP neurons or indirect effects via changes in growth rates could also contribute to increased AgRP fiber density in the PVH. In support of the first possibility, postnatal microglial depletion increases the number of AgRP neurons (Sun, et al. 2023).

      We agree with the suggestion, also raised by the Reviewing Editor, which has been addressed briefly in the Introduction, and in more detail by revisions to the Discussion section.

      (4) The failure to assess alpha-MSH fibers in the same animals was a missed opportunity. They are also affected by MHFD but likely involve a distinct mechanism (Vogt, et al 2014).

      Given the paired interest in POMC neurons and AgRP neurons I understand the reviewer’s comment. We chose to focus solely on AgRP neurons because we do not currently have a way to genetically target axonal labeling exclusively to POMC neurons due to the shared precursor origin of POMC neurons and a percentage of NPY neurons in the ARH, as shown by Lori Zeltser’s laboratory. Moreover, the elegant work by Vogt et al. focused on responses of POMC neurons in the MHFD-L model. However, it certainly remains possible that microglia in the PVH interact with terminals derived from POMC neurons, as well as with terminals derived from other afferent populations of neurons.

      (5) All statistical analyses involved unpaired t-tests. Two-way ANOVAs should be used to assess the effects of age and HFD and interactions between these factors.

      We used t-tests to detect differences in a single variable between two groups and two-way ANOVA to compare treatment by diet and treatment changes in the PLX5622 studies.  Additional detail has been added to the Methods section and information added to the figure legend for Fig. 4 to clarify this point.

      Reviewer #3 (Recommendations for the authors):

      I suggest exploring the deeper characterization of the microglia in various parts of the hypothalamus in different conditions. This could include cytokine assessment or spatial transcriptomic.

      We agree that a great deal more work is needed to improve our understanding of how microglia impact hypothalamic development more broadly and to identify underlying molecular mechanisms. We are hopeful that the data presented here will motivate additional study of microglial dynamics in multiple hypothalamic regions, as well as detailed studies of cellular signaling events for factors derived from MHFD-L dams that impact neural development in the hypothalamus.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      n this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment.

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review):

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy.

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript finds a negative relationship between tuberculin skin test-induced type I interferon activity with chest X-ray tuberculosis severity in humans. This evidence is between incomplete and solid. It needs a bioinfomatics/transcriptomics reviewer to make a more insightful judgement. The manuscript demonstrates a convincing role for Stat2 in controlling Mycobacterium marinum infection in zebrafish embryos, incomplete data are presented linking reduced leukocyte recruitment to the infection susceptibility phenotype.

      Strengths:

      (1) An interesting analysis of TST response correlated with chest X-ray pathology.

      (2) Novel data on a protective role for Stat2 in a natural host-mycobacterial species infection pairing.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      (1) The transcriptional modules are very large sets of genes that do not present a clear picture of what is actually being measured relative to other biological pathways.

      The transcriptional module analysis is a major strength of our approach. These gene signatures are derived from independent experiments, most of which have been previously published/validated [1,2]. To clarify, they represent co-regulated gene sets downstream of signalling pathways. Increased number of genes in these modules increases their combinatorial specificity for a given biological pathway. In the human data, they serve as orthogonal validation for the bioinformatic analysis showing enrichment of the type I IFN pathway among TST transcriptome genes that are negatively correlated with radiographic disease severity in pulmonary TB (see Figure 2). Importantly, our modules confirm the relationship with type I IFN signalling (see Figure 2E) by discriminating from type II IFN signalling, which is not statistically significantly correlated with radiographic TB severity (see Figure S6C-E).

      (2) The link between infection-Stat2-leukocyte recruitment and containment of infection is plausible, but lacks a specific link to the first part of the manuscript.

      For clarification, the first part of the study seeks to identify immune response pathways that relate to severity of human disease, leading to the identification of type I IFN signalling. Since the human data are limited to an observational analysis in which we cannot test causality, the second part of our study uses a genetically tractable experimental model to test the hypothesis that type I IFN signalling is host-protective and explore possible mechanisms for a beneficial effect. This leads to the observation that type I IFN responses contribute to early myeloid cell recruitment to the site of infection, that has previously been shown to be crucial for containment of mycobacterial infection in zebrafish larvae. We will further evaluate the introduction and results sections to ensure a clear link between the human and zebrafish work.

      Major concerns

      (1) Line 158: The two transcriptional modules should be placed in the context of other DEG patterns. The macrophage type I interferon module, in particular, is quite large (361 genes). Can this be made more granular in terms of type I IFN ligands and STAT2-dependent genes?

      We respectfully disagree with this comment. For clarification, the 360 gene module reflects the zebrafish larval response to IFNphi1 protein [3]. Type I IFNs are known to induce hundreds of interferon stimulated genes [4]. As explained above, the size of the modules increases specificity for a given signalling pathway. In this case, we are most interested in discriminating type I and type II IFN signalling pathways that represent very different upstream biological processes. The discrimination we achieve with our modular approach is a major advance over previous reports of gene signatures in TB that do not discriminate between the two pathways. In this study, we did not discriminate between signalling downstream of type I IFN ligands and STAT2, consistent with existing literature showing that type I IFN signalling is STAT2 dependent [5,6].

      (2) The ifnphi1 injection into mxa:mCherry stat2 crispants is a nice experiment to demonstrate loss of type I IFN responsiveness. Further data is required to demonstrate if important mycobacterial control pathways (IFNy, TNF, il6?, etc) are intact in stat2 crispants before being able to conclude that these phenotypes are specific to type I IFN.

      Thank you for the positive comment. We acknowledge this point and will attempt to evaluate whether pro-inflammatory cytokine responses are intact in stat2 CRISPants by qPCR or bulk RNAseq. However, these experiments may prove inconclusive because of the limited sensitivity in this approach.

      Reviewer #2 (Public review):

      Summary:

      This study shows that type I interferon (IFN-I) signaling helps protect against mycobacterial infection. Using human gene expression data and a zebrafish model, the authors find that reduced IFN-I activity is linked to more severe disease. They also show that zebrafish lacking the IFN-I signaling gene stat2 are more vulnerable to infection due to poor macrophage migration. These results suggest a protective role for IFN-I in mycobacterial disease, challenging previous findings from other animal models.

      Strengths:

      Strengths of the manuscript include the use of human clinical samples to support relevance to disease, along with a genetically tractable zebrafish model that enables mechanistic insight.

      We welcome the reviewer’s positive summary of our study.

      Weaknesses:

      (1) The manuscript presents intriguing human data showing an inverse correlation between IFN-I gene signatures and TB disease, but the findings remain correlative and may be cohort-specific. Given that the skin is not a primary site of TB and is relatively immunotolerant, the biological relevance of downregulated IFN-I-related genes in this tissue to systemic or pulmonary TB is unclear.

      We agree with the reviewer that the observational human data are correlative. That is precisely why we extend the study to undertake mechanistic studies in a genetically tractable animal model, using M. marinum infection of zebrafish larvae. In the introduction, we already provide a detailed rationale for the strengths of the TST model to study human immune responses to a standardised mycobacterial challenge. This approach mitigates against the confounding of heterogeneity in bacterial burden and sampling different stages of the natural history of infection in conventional observational human studies. Therefore, the application of the TST is a major strength of this study. We do not understand the context in which the reviewer suggests the skin is immunotolerant. In the present study and previous work we provide molecular level analysis of the TST as a robust cell mediated immune response that reflects molecular perturbation in granuloma from the site of pulmonary TB disease 1.

      (2) The reliance on stat2 CRISPants in zebrafish offers a limited view of IFN-I signaling. Including additional crispant lines targeting other key regulators (e.g., ifnar1, tyk2, irf3, irf7) would strengthen the interpretation and clarify whether the observed effects reflect broader IFN-I pathway disruption.

      We respectfully disagree with this comment. Our objective was to test the role of type I IFN signalling in M. marinum infection of zebrafish. We show that stat2 deletion effectively disrupts type I IFN signalling (Figure S8). Therefore, we do not see a compelling rationale to evaluate other molecules in the signalling pathway.

      (3) The conclusion that IFN-I is protective contrasts with established findings from murine and non-human primate models, where IFN-I is often detrimental. While the authors highlight species differences, the lack of functional human data and reliance on M. marinum in zebrafish limit the translational relevance. A more balanced discussion addressing these discrepancies would improve the manuscript.

      We acknowledge that our findings contrast with the prevailing view in published literature to date. We will further review the discussion to see how we can elaborate on the potential strengths and weaknesses of different experimental approaches, which may underpin these discrepancies.

      (4) Quantification of bacterial burden using fluorescence intensity alone may not accurately reflect bacterial viability. Complementary methods, such as qPCR for bacterial DNA, would provide a more robust assessment of antimicrobial activity.

      We and others have previously validated the use of the quantitative measures of fluorescence, used here as a measure of bacterial load [7,8]. Importantly, our measurements do not rely purely on the total fluorescence signal, but also measures of dissemination of infection, for which we see consistent findings. It is also widely recognised that DNA measurements do not necessarily correlate well with bacterial viability. Therefore, we respectfully disagree that a PCR-based approach will add substantial value to our existing analysis.

      (5) Finally, the authors should clarify whether impaired macrophage recruitment in stat2 crispants results from defects in chemotaxis, differentiation, or survival, and address discrepancies between their human blood findings and prior studies.

      We acknowledge that these are important questions. Our data show that stat2 disruption does not impact total macrophage numbers at baseline (Figure 4A,B) and therefore do not support any effect of Stat2 signalling on steady state macrophage survival or differentiation. The downregulation of macrophage mpeg1 expression in M. marinum infection precludes long-term follow-up of these cells in the context of infection [9]. Therefore, we cannot currently test the hypothesis that Stat2 signalling may influence death of macrophages recruited to the site of infection or make them more susceptible to the cytopathic effects of direct mycobacterial infection. We will attempt to confirm using short-term time-lapse imaging that cellular migration to the site of hindbrain M. marinum infection is reduced in stat2 deficient zebrafish. On the strength of what is possible to test and the established role of type I IFNs in induction of several chemokines [10,11], the most likely effect is that Stat2 signalling increases recruitment through chemokine production. We are exploring the possibility of testing changes to the chemokine profile in stat2 CRISPants by qPCR or bulk RNAseq, but these experiments may prove inconclusive because of the limitations of sensitivity in this approach.

      We recognize that our finding of no relationship between peripheral blood type I IFN activity and severity of human TB contrasts with that of previous studies. As stated in the discussion, the most likely explanation for this is our use of transcriptional modules which reflect exclusive type I IFN responses. The signatures used in other studies include both type I and type II IFN inducible genes and therefore also reflect IFN gamma driven responses.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors presented an interesting study providing an insight into the role of Type-I interferon responses in tuberculosis (TB) pathogenesis by combining transcriptome analysis of PBMCs and TST from tuberculosis patients. The zebrafish model was used to identify the changes in the innate immune cell population of macrophages and neutrophils. The findings suggested that Type-I interferon signatures inversely correlated with disease severity in the TST transcriptome data. The authors validated the observations by CRISPR-mediated disruption of stat2 (a critical transcription factor for type I interferon signaling) in zebrafish larvae, showing increased susceptibility to M. marinum infection. Traditionally, type-I interferon responses have been viewed as detrimental in mycobacterial infections, with studies suggesting enhanced susceptibility in certain mouse models. The study tried to identify and further characterize the understanding of the role of type-I interferons in TB.

      Strengths:

      Traditionally, type-I interferon responses have been viewed as detrimental in mycobacterial infections, with studies suggesting enhanced susceptibility in certain mouse models. The study tried to further understand the role of type-I interferons in TB pathogenesis.

      We thank the reviewer for their summary.

      Weaknesses:

      Though the study showed an inverse correlation of Type-I interferon with radiological features of TB, the molecular mechanism is largely unexplored in the study, which is making it difficult to understand the basis of the results shown in the manuscript by the authors.

      We respectfully disagree with this comment. The observations in the human data lead to the hypothesis that type I IFN responses may be host-protective, which we then test specifically in the zebrafish model, and explore candidate mechanisms, focussing on myeloid cell recruitment to the site of infection.

      References

      (1) Bell, L.C.K., Pollara, G., Pascoe, M., Tomlinson, G.S., Lehloenya, R.J., Roe, J., Meldau, R., Miller, R.F., Ramsay, A., Chain, B.M., et al. (2016). In Vivo Molecular Dissection of the Effects of HIV-1 in Active Tuberculosis. PLoS Pathog. 12, e1005469. https://doi.org/10.1371/journal.ppat.1005469.

      (2) Pollara, G., Turner, C.T., Rosenheim, J., Chandran, A., Bell, L.C.K., Khan, A., Patel, A., Peralta, L.F., Folino, A., Akarca, A., et al. (2021). Exaggerated IL-17A activity in human in vivo recall responses discriminates active tuberculosis from latent infection and cured disease. Sci. Transl. Med. 13, eabg7673. https://doi.org/10.1126/scitranslmed.abg7673.

      (3) Levraud, J.-P., Jouneau, L., Briolat, V., Laghi, V., and Boudinot, P. (2019). IFN-Stimulated Genes in Zebrafish and Humans Define an Ancient Arsenal of Antiviral Immunity. J. Immunol. Baltim. Md 1950 203, 3361–3373. https://doi.org/10.4049/jimmunol.1900804.

      (4) Schoggins, J.W. (2019). Interferon-Stimulated Genes: What Do They All Do? Annu. Rev. Virol. 6, 567–584. https://doi.org/10.1146/annurev-virology-092818-015756.

      (5) Blaszczyk, K., Nowicka, H., Kostyrko, K., Antonczyk, A., Wesoly, J., and Bluyssen, H.A.R. (2016). The unique role of STAT2 in constitutive and IFN-induced transcription and antiviral responses. Cytokine Growth Factor Rev. 29, 71–81. https://doi.org/10.1016/j.cytogfr.2016.02.010.

      (6) Begitt, A., Droescher, M., Meyer, T., Schmid, C.D., Baker, M., Antunes, F., Knobeloch, K.-P., Owen, M.R., Naumann, R., Decker, T., et al. (2014). STAT1-cooperative DNA binding distinguishes type 1 from type 2 interferon signaling. Nat. Immunol. 15, 168–176. https://doi.org/10.1038/ni.2794.

      (7) Stirling, D.R., Suleyman, O., Gil, E., Elks, P.M., Torraca, V., Noursadeghi, M., and Tomlinson, G.S. (2020). Analysis tools to quantify dissemination of pathology in zebrafish larvae. Sci. Rep. 10, 3149. https://doi.org/10.1038/s41598-020-59932-1.

      (8) Takaki, K., Davis, J.M., Winglee, K., and Ramakrishnan, L. (2013). Evaluation of the pathogenesis and treatment of Mycobacterium marinum infection in zebrafish. Nat. Protoc. 8, 1114–1124. https://doi.org/10.1038/nprot.2013.068.

      (9) Benard, E.L., Racz, P.I., Rougeot, J., Nezhinsky, A.E., Verbeek, F.J., Spaink, H.P., and Meijer, A.H. (2015). Macrophage-expressed perforins mpeg1 and mpeg1.2 have an anti-bacterial function in zebrafish. J. Innate Immun. 7, 136–152. https://doi.org/10.1159/000366103.

      (10) Lehmann, M.H., Torres-Domínguez, L.E., Price, P.J.R., Brandmüller, C., Kirschning, C.J., and Sutter, G. (2016). CCL2 expression is mediated by type I IFN receptor and recruits NK and T cells to the lung during MVA infection. J. Leukoc. Biol. 99, 1057–1064. https://doi.org/10.1189/jlb.4MA0815-376RR.

      (11) Buttmann, M., Merzyn, C., and Rieckmann, P. (2004). Interferon-beta induces transient systemic IP-10/CXCL10 chemokine release in patients with multiple sclerosis. J. Neuroimmunol. 156, 195–203. https://doi.org/10.1016/j.jneuroim.2004.07.016.

    1. Author response:

      Reviewer #1:

      Lipid transfer proteins (LTPs) play a crucial role in the intramembrane lipid exchange within cells. However, the molecular mechanisms that govern this activity remain largely unclear. Specifically, the way in which LTPs surmount the energy barrier to extract a single lipid molecule from a lipid bilayer is not yet fully understood. This manuscript investigates the influence of membrane properties on the binding of Ups1 to the membrane and the transfer of phosphatidic acid (PA) by the LTP. The findings reveal that Ups1 shows a preference for binding to membranes with positive curvature. Moreover, coarse-grained molecular dynamics simulations indicate that positive curvature decreases the energy barrier associated with PA extraction from the membrane. Additionally, lipid transfer assays conducted with purified proteins and liposomes in vitro demonstrate that the size of the donor membrane significantly impacts lipid transfer efficiency by Ups1-Mdm35 complexes, with smaller liposomes (characterized by high positive curvature) promoting rapid lipid transfer.

      This study offers significant new insights into the reaction cycle of phosphatidic acid (PA) transfer by Ups1 in mitochondria. Notably, the authors present compelling evidence that, alongside negatively charged phospholipids, positive membrane curvature enhances lipid transfer - an effect that is particularly relevant at the mitochondrial outer membrane. The experiments are technically robust, and my primary feedback pertains to the interpretation of specific results.

      (1) The authors conclude from the lipid transfer assays (Figure 5) that lipid extraction is the rate-limiting step in the transfer cycle. While this conclusion seems plausible, it should be noted that the authors employed high concentrations of Ups1-Mdm35 along with less negatively charged phospholipids in these reactions. This combination may lead to binding becoming the rate-limiting factor. The authors should take this point into consideration. In this type of assay, it is challenging to clearly distinguish between binding, lipid extraction, and membrane dissociation as separate processes.

      We thank the reviewer for the constructive and positive evaluation of our manuscript. We agree that, while our data support the interpretation that the rate-limiting step occurs at the donor membrane, it is difficult to dissect in our assay which of the individual steps at the donor membrane - such as binding of Ups1, lipid extraction into the binding pocket, or dissociation of Ups1 - is rate-limiting. Nevertheless, although we cannot exclude contributions from membrane binding or dissociation, several observations suggest that lipid extraction is a rate-limiting step under our experimental conditions.

      The acceptor membrane has a similar lipid composition to the donor membrane (in tendency, the donor membrane is even a bit richer in binding-promoting lipids). If binding was ratelimiting, similar constraints would be expected at the acceptor membrane during lipid insertion. However, this is not observed.

      Regarding dissociation, if this step were rate-limiting, one would expect similar constraints to be evident at the acceptor vesicles as well. Nevertheless, membrane dissociation might be mechanistically coupled to lipid extraction and thus difficult to evaluate as an independent step.

      Based on our data and the considerations described above, we suggest that lipid extraction is the dominant rate-limiting step at the donor membrane under our conditions. However, we agree that a clear separation of these individual steps is not possible with the current experimental design. We will revise the corresponding passage to clarify that the rate-limiting step occurs at the donor membrane and, based on our observations, likely involves lipid extraction. Future studies aiming on dissecting these steps, will be important for elucidating the mechanism and regulation of Ups1-mediated lipid transfer both in vitro and in vivo.

      (2) The authors should discuss that variations in the size of liposomes will also affect the distance between them at a constant concentration, which may affect the rate of lipid transfer. Therefore, the authors should determine the average size and size distribution of liposomes after sonication (by DLS or nanoparticle analyzer, etc.)

      We agree that variations in liposome size will influence the average distance between vesicles at a given lipid concentration, which may in turn affect the rate of lipid transfer. As suggested, we will include DLS measurements to characterize the size distribution of our different liposome preparations.

      Our setup was designed to keep the total membrane surface area comparable across conditions. This approach ensures a comparable overall binding capacity for Ups1 and enables the comparison of membrane binding and lipid extraction from different membranes. However, we agree that vesicle spacing, which is affected by liposome size at constant lipid concentration, could potentially influence certain steps in the transfer process, such as the time required for Ups1 to travel between donor and acceptor membranes. Whether this intermembrane travel time contributes to rate limitation is indeed an interesting question, and we will address this point through further discussion in the revised manuscript.

      Investigating such effects in our current experimental system would require altering the vesicle concentration, which would in turn change the total membrane surface area and introduce additional variables. Nevertheless, exploring the influence of vesicle spacing and intermembrane distance on lipid transfer represents a promising direction for future studies aimed at dissecting the rate-limiting steps of the transfer cycle.

      (3) The authors use NBD-PA in the lipid transfer assays. Does the size of the donor liposomes affect the transfer of NBD-PA and DOPA similarly? Since NBD-labeled lipids are somewhat unstable within lipid bilayers (as shown by spontaneous desorption in Figure 5B), monitoring the transfer of unlabeled PA in at least one setting would strengthen the conclusion of the swap experiments.

      Ups1-mediated transfer of PA has been demonstrated both by mass spectrometry analysis of donor and acceptor vesicles (Connerth et al., 2012) and by NBD-fluorescence-based lipid transfer assays (Lu et al., 2020; Miliara et al., 2015; Miliara et al., 2019; Miliara et al., 2023; Potting et al., 2013; Watanabe et al., 2015). The fluorescence-based approach has been the most widely applied across multiple studies and has enabled detailed analysis of various aspects of lipid transfer by Ups1. It has been used to investigate mutants of key structural elements—such as the lipid-binding pocket and the α2–loop region. It has also been used to analyze fusion constructs between Ups1 and Mdm35, the influence of Mdm35 variants, and competition with excess Mdm35. Additionally, by comparing the transfer of NBD-labeled PA and NBD-labeled PS, this assay has provided insights into the determinants of the lipid specificity of Ups1. Hence, our experiments are based on the standard assay used to analyse lipid transfer in the field and thus can be corralated with the majority of published data.

      Nevertheless, we agree that it is important to keep in mind that NBD labeling may alter the biophysical properties of lipids and, consequently, affect their transfer efficiency. Moreover, NBD-labeled lipids are not suitable for comparing the transfer efficiency of different PA species, as the label itself may mask differences in acyl chain composition. Therefore, it will be valuable to establish complementary methods that do not rely on NBD-labeled PA. We aim to develop these non-standard methods for possible inclusion in the present study, but even if not fully implemented at this stage, they will certainly form an important part of future investigations.

      (4) The present study suggests that membrane domains with positive curvature at the outer membrane may serve as starting points for lipid transport by Ups1-Mdm35. Is anything known about the mechanisms that form such structures? This should be discussed in the text.

      The origin of positively curved membrane domains is indeed highly relevant in the context of our findings, and while not the primary focus of this work, we will place more emphasis on discussing how such curvature may arise. Mechanisms include the action of curvature-generating proteins, asymmetric lipid composition and curvature induced at membrane contact sites. We have so far included examples of proteins in the outer mitochondrial membrane that are expected to influence curvature in their vicinity, and we will expand on this aspect and other contributing factors more thoroughly in the revised text.

      Reviewer #2:

      Summary:

      Lipid transfer between membranes is essential for lipid biosynthesis across different organelle membranes. Ups1-Mdm35 is one of the best-characterized lipid transfer proteins, responsible for transferring phosphatidic acid (PA) between the mitochondrial outer membrane (OM) and inner membrane (IM), a process critical for cardiolipin (CL) synthesis in the IM. Upon dissociation from Mdm35, Ups1 binds to the intermembrane space (IMS) surface of the OM, extracts a PA molecule, re-associates with Mdm35, and moves through the aqueous IMS to deliver PA to the IM. Here, the authors analyzed the early steps of this PA transfer - membrane binding and PA extraction - using a combination of in vitro biochemical assays with lipid liposomes and purified Ups1-Mdm35 to measure liposome binding, lipid transfer between liposomes, and lipid extraction from liposomes. The authors found that membrane curvature, a previously overlooked property of the membrane, significantly affects PA extraction but not PA insertion into liposomes. These findings were further supported by MD simulations.

      Strengths:

      The experiments are well-designed, and the data are logically interpreted. The present study provides an important basis for understanding the mechanism of lipid transfer between membranes.  

      Weaknesses:

      The physiological relevance of membrane curvature in lipid extraction and transfer still remains open.

      We thank the reviewer for the constructive feedback on our work. We agree that the physiological relevance of membrane curvature in lipid extraction and transfer remains an open question. Our data show that Ups1 binding to native-like OM membranes under physiological pH conditions is curvature-dependent, supporting the idea that this mechanism may optimize lipid transfer in vivo. While the intricate biophysical basis of this behaviour can only be dissected in vitro, these findings offer valuable insight into how curvature may functionally regulate Ups1 activity in the cellular context. To directly test this, it will be important in future studies to identify Ups1 mutants that lack curvature sensitivity and assess their performance in vivo, which will help clarify the physiological importance of this mechanism.

      Reviewer #3:

      The manuscript by Sadeqi et al. studies the interactions between the mitochondrial protein Ups1 and reconstituted membranes. The authors apply synthetic liposomal vesicles to investigate the role of pH, curvature, and charge on the binding of Ups1 to membranes and its ability to extract PA from them. The manuscript is well wrifen and structured. With minor exceptions, the authors provide all relevant information (see minor points below) and reference the appropriate literature in their introduction. The underlying question of how the energy barrier for lipid extraction from membranes is overcome by Ups1 is interesting, and the data presented by the authors could offer a valuable new perspective on this process. It is also certainly a challenging in vitro reconstitution experiment, as the authors aim to disentangle individual membrane properties (e.g., curvature, charge, and packing density) to study protein adsorption and lipid transfer. I have one major suggestion and a few minor ones that the authors might want to consider to improve their manuscript and data interpretation:

      Major Comments:

      The experiments are performed with reconstituted vesicles, which are incubated with recombinant protein variants and quantitatively assessed in flotation and pelleting assays. According to the Materials and Methods section, the lipid concentration in these assays is kept constant at 5 µM. However, the authors change the size of the vesicles to tune their curvature. Using the same lipid concentration but varying vesicle sizes results in different total vesicle concentrations. Moreover, larger vesicles (produced by freeze-thawing and extrusion) tend to form a higher proportion of multilamellar vesicles, thus also altering the total membrane area available for binding. Could these differences in the experimental system account for the variation in binding? To address this, the authors would need to perform the experiments either under saturation (excess protein) conditions or find an experimental approach to normalize for these differences.

      We thank the reviewer for the constructive and positive comments. We agree that, since the total number of lipids was kept constant, the number of vesicles varied with vesicle size in our experiments. However, the setup was specifically designed to maintain a comparable total membrane surface area across conditions, ensuring a comparable number of available binding sites for Ups1. Because membrane surface area decreases with the square of the vesicle radius, keeping vesicle number constant would have led to a marked reduction in binding surface. Our approach was therefore aimed at preserving comparable binding capacity while varying membrane curvature.

      With respect to multilamellarity, we thank the reviewer for addressing this important point. As described above, we aimed to maintain a constant total membrane surface area across all conditions to ensure an equal number of potential binding sites. We agree that multilamellarity in large liposomes could restrict accessibility to part of the membrane surface. However, we see in our experiments that even when the total membrane surface area of the small liposomes is reduced to one quarter of the standard amount, binding to the small liposomes remained stronger than to the larger liposomes at the higher concentration. This strongly indicates that restricted accessibility cannot account for the curvature-specific effect observed. Nonetheless, we will further address this aspect experimentally and in the discussion of the revised manuscript.

      References

      Connerth, M., Tatsuta, T., Haag, M., Klecker, T., Westermann, B., & Langer, T. (2012). Intramitochondrial transport of phosphatidic acid in yeast by a lipid transfer protein. Science, 338(6108), 815-818. https://doi.org/10.1126/science.1225625 

      Lu, J., Chan, C., Yu, L., Fan, J., Sun, F., & Zhai, Y. (2020). Molecular mechanism of mitochondrial phosphatidate transfer by Ups1. Commun Biol, 3(1), 468. https://doi.org/10.1038/s42003-020-01121-x 

      Miliara, X., Garnef, J. A., Tatsuta, T., Abid Ali, F., Baldie, H., Perez-Dorado, I., Simpson, P., Yague, E., Langer, T., & Mafhews, S. (2015). Structural insight into the TRIAP1/PRELI-like domain family of mitochondrial phospholipid transfer complexes. EMBO Rep, 16(7), 824-835. https://doi.org/10.15252/embr.201540229 

      Miliara, X., Tatsuta, T., Berry, J. L., Rouse, S. L., Solak, K., Chorev, D. S., Wu, D., Robinson, C. V., Mafhews, S., & Langer, T. (2019). Structural determinants of lipid specificity within Ups/PRELI lipid transfer proteins. Nat Commun, 10(1), 1130. https://doi.org/10.1038/s41467-019-09089-x 

      Miliara, X., Tatsuta, T., Eiyama, A., Langer, T., Rouse, S. L., & Mafhews, S. (2023). An intermolecular hydrogen bonded network in the PRELID-TRIAP protein family plays a role in lipid sensing. Biochim Biophys Acta Proteins Proteom, 1871(1), 140867. https://doi.org/10.1016/j.bbapap.2022.140867 

      Posng, C., Tatsuta, T., Konig, T., Haag, M., Wai, T., Aaltonen, M. J., & Langer, T. (2013). TRIAP1/PRELI complexes prevent apoptosis by mediating intramitochondrial transport of phosphatidic acid. Cell Metab, 18(2), 287-295. https://doi.org/10.1016/j.cmet.2013.07.008 

      Watanabe, Y., Tamura, Y., Kawano, S., & Endo, T. (2015). Structural and mechanistic insights into phospholipid transfer by Ups1-Mdm35 in mitochondria. Nat Commun, 6, 7922. https://doi.org/10.1038/ncomms8922

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) 8 molar urea not only denatures proteins but also denatures DNA. Obviously, this does not affect the ChIP, since antibodies often recognize small linear epitopes and the proteins are crosslinked. However, under high urea conditions the BUR elements should be rendered single-stranded, and one wonders whether this has any effect on the procedure. The authors should alert the reader of these circumstances.

      Thank you for raising this important question about the effects of 8M urea. We have added a brief paragraph explaining this point in the revised manuscript. Despite common misconceptions, 8M urea by itself does not actively convert double-stranded DNA to single-stranded DNA. For this conversion to occur, a heat denaturation step is required. Once DNA is heat-denatured to become single-stranded, urea can maintain this configuration. This is why the addition of 8M urea to acrylamide gel electrophoresis is a standard method for analyzing single-stranded oligonucleotides, but the DNA must first be denatured by heat (Summer et al., J. Vis. Exp. (32), e1485, DOI : 10.3791/1485). This is clearly described in published work comparing the status of DNA with and without heat treatment in an 8M urea-containing buffer (Hegedus et al., Nucl.Acids Res. 2009 (doi:10.1093/nar/gkp539).

      We have additional evidence supporting this conclusion in the context of our urea ultracentrifugation experiment. Both crosslinked and un-crosslinked genomic DNA purified by 8M urea centrifugation can be digested with restriction enzymes, which indicates that the DNA remains double-stranded. For instance, we previously published SATB1 ChIP-3C results using Sau3A-digested DNA after urea purification. In the current paper, we used HindIII to digest urea-purified DNA for urea4C-seq. The BUR reference map can also be generated after restriction digestion of urea-purified DNA and isolating and sequencing SATB1-bound restriction fragments in vitro. If genomic DNA were denatured by 8M urea ultracentrifugation, we would not have been able to digest it with restriction enzymes to obtain these results.

      We have now added a sentence noting that SATB1 is a double-stranded DNA-binding protein that does not bind to single-stranded DNA, as we have previously shown (Dickinson et al., 1992, Ref 32).

      (2) An important conclusion is that urea-ChIP reveals direct DNA binding events, whereas standard ChIP shows indirect binding (which is stripped off by urea). I do not see any evidence for direct binding. At low resolution, predicted BUR elements are enriched in domains where SATB-1 is mapped by urea-ChIP. A statement like 'In a zoomed-in view, covering a 430 kb region, SATB1 sites identified from urea ChIP-seq precisely coincided with BUR peaks' is certainly not correct: most BUR peaks do not show significant SATB-1 binding. The randomly chosen regions shown in Figure 4 – Supplement 1 show how poor the overlap of SATB-1 and BURs is; indeed, they show that SATB-1 binds DNA mostly at non-BUR sites. I see Figure 2D, but such cumulative plots can be highly biased by very few cases. I suggest showing these data in heat maps instead.

      We believe there may be some confusion regarding the interpretation of our figures. Looking at Track 3 (BUR reference map, RED peaks) and urea SATB1 Tracks 4 and 5 (replicas from two independent experiments) in Fig. 2B, the SATB1 peaks detected by urea ChIP-seq do indeed coincide with BUR peaks. In the revised manuscript, we have provided a further ‘zoomed-in’ view to better illustrate this point and also provided the underlying BUR sequence from one of these SATB1-bound regions (Figure 2—supplement figure 1).

      It is true that many more BURs exist than SATB1-bound BURs, especially in gene-poor regions where BURs are clustered. However, from the perspective of SATB1-bound peaks, the majority of these coincide with BURs, as shown by both deepTools analyses and new heatmap, as suggested (Figure 2E, and Figure 7—supplement figure 3).

      The results from our genome-wide quantitative analyses using deepTools to compare peaks from urea SATB1 ChIP-seq data and the BUR reference map shown in Supplementary Tables 1 and 2 are consistent with the heatmap analyses.

      We must apologize for an error in the scaling of the y-axis in Figure 4-supplement figure 1 that likely contributed to some confusion. We have corrected our mistake in the revised manuscript. As we were preparing our figures, when placed in the figure and axes relabeled for legibility, the BUR reference peaks were mislabeled on their y-axis. In the figure the peaks were erroneously labeled on a scale of 0.1-1 read counts/million reads, but the data shown is actually scaled at 0.1 to 2 read counts per million reads. Unfortunately, we did not realize this error and, using the figure as a guide for scaling, provided urea SATB1 ChIP-seq peaks at a scale of 0.1-1 read counts/million reads to match the mislabeled BUR reference track. This had the effect of reducing the signal/noise in the SATB1 ChIP-seq data (Figure 1). We have now standardized the y-axis for fair comparison using a scaling of the y-axis at 0.1-2 for all tracks.  This will more clearly show that there are indeed more BUR peaks than SATB1-bound sites, consistent with our quantitative analysis.

      We hope that these clarifications as well as the added heatmaps and binding site example allay the concerns about the specificity and overlap of SATB1 binding on BURS.

      (3) In Figure 6C 'peaks' are compared. However, looking at Figure 4 - Supplement 1 again it is clear that peak calling can yield a misleading impression. Figure 6D suggests that there are more BURs than SATB-1 peaks but this is not true from looking at the browser.

      We thank the reviewer for this observation. As noted in our response to point 2 above, the inconsistent y-axis scaling in Figure 4-supplement figure 1 created a misleading impression, which we have corrected in the revised manuscript. When properly displayed with consistent y-axis scaling, the browser view aligns with our quantitative data showing that there are indeed many more BURs than SATB1-bound sites. As mentioned under 2 above, we have performed genome-wide quantitative analysis by deepTools (Supplementary Tables 1 and 2) to confirm the results shown by bar graphs in Fig. 6C, 6D and Fig. 2D. 

      In Figure 6C, the bars show the percentage of SATB1-bound peaks in each cell type (denominator) that overlap with confirmed BUR sites in the BUR reference map (numerator). In Figure 6D, we show the percentage of total BUR sites in the BUR reference map (denominator) that are bound by SATB1 from urea ChIP-seq (numerator). To avoid any confusion, we have added brief subtitles to Figures 6C and 6D in the revised manuscript.

      (4) An important conclusion is that urea-ChIP reveals direct DNA binding events, whereas standard ChIP shows indirect binding (which is stripped off by urea). I do not yet see any evidence for direct binding. It cannot be excluded that the binding is RNA-mediated. The authors mention in passing that urea-ChIP material still contains (specific!) RNA. Given that this is a new procedure, the authors should document the RNA content of urea-ChIP and RNase-treat their samples prior to ChIP to monitor an RNA contribution.

      Thank you for raising this important point. The direct binding of SATB1 to BURs is well-established in our previous work. Indeed, this was the main motivation to explore the reason for the lack of evidence for genome-wide SATB1 binding to BURs in the DNA-binding profile by standard ChIP-seq. This has been a major point of confusion for us for many years.

      SATB1 was originally identified through a search for mammalian proteins that could recognize BURs specifically and not just any A+T-rich sequence. The Satb1 gene was originally cloned by an expression cDNA library and encoded SATB1 protein bound the BUR probe but not a mutated AT-rich BUR (control) probe.  Subsequent experiments confirmed that SATB1 specifically binds to many BURs without requiring additional factors. Furthermore, SATB1 recognizes BURs by binding in the minor groove of double-stranded DNA, presumably recognizing the altered phosphate backbone structure of BUR DNA, rather than accessing nucleotide bases (Dickinson et al, 1992).

      We do agree with the reviewer, however, that there is a possibility that RNA can redirect SATB1 to different subsets of BURs and/or to interact indirectly with different regulatory regions depending on cell type or developmental stage. Although urea ultracentrifugation clearly separates most RNA (found in the middle region of the tube) from genomic DNA (pelleted at the bottom) (de Belle et al., 1998), upon crosslinking cells, a small quantity of RNA is found co-pelleted with DNA (our recent unpublished results). This RNA, tightly associated with crosslinked chromatin, may have some impact on SATB1 function.

      Based on our preliminary data, we are currently planning to study the impact of RNA using RNase A as well as by targeting specific RNAs employing an anti-sense approach. We believe that thoroughly addressing the impact of RNA warrants a full paper, including the potential roles of specific non-coding RNAs in SATB1 function, and thus is beyond the scope of the current paper. However, we have now added discussion of this important point in the manuscript.

      (5) An important aspect of the model is that SATB1 tethers active genes to inactive LADs. However, in the 4C experiment the BUR elements used to anchor the looping are both in the accessible, active chromatin domain. If the authors want to maintain their statement, they must show a 4C result that connects the 2 distinct domains and transverses A/B domain boundaries. Currently, the data only show a looping within accessible chromatin.

      We appreciate REVIEWER 1 for bringing up the important point that our model could potentially be interpreted as “SATB1 tethers active genes to inactive LADs.” Since we describe that BURs are enriched in LADs and that SATB1 binds a subset of BURs, readers may assume that we aim to demonstrate, through urea 4C-seq, that SATB1 tethers active genes to transcriptionally-inactive LADs (via BURs). However, this is not our intention in the model (Figure 8). In the experiment we designed for our present study,  we selected BUR-1 and BUR-2 as viewpoints from a non-LAD gene-rich region (inter-LAD). Because these BURs are bound by SATB1, it indicates that these BURs are part of the “hard-to-access” SATB1-rich subnuclear structure, which resists extraction, in contrast to accessible chromatin. Thus, we illustrate in the model that BURs anchored to the SATB1-rich nuclear substructure make contact with accessible chromatin over long distances in a SATB1-dependent manner. Therefore, we do not intend to conclude that SATB1 mediates interactions between LADs and inter-LADs (accessible chromatin) from our current study: this would be a topic for future research. In the original model in the submitted manuscript, we used the terms “inaccessible” and “accessible.” In the revised version, we clarified this in the model by changing “inaccessible” to “SATB1-rich subnuclear structure” and carefully revised  the text in the Figure 8 legend to clarify the model. 

      At this time, we do not know exactly how LADs and SATB1 nuclear architecture are related spatially and functionally. While LADs are mapped as genomic domains in proximity to Lamin B1 by LaminB1-DamID, BURs are mapped at ~300-500 bp resolution by urea ChIP-seq. To gain further insight into this important question, a large body of DNA-FISH and immunoDNA-FISH experiments will be required, comparing different cell types to see whether and how specific BURs move between LADs and SATB1 nuclear architecture. Such experiments may benefit from testing the Gabrg1 and Gabra2 loci, where many BURs are anchored to SATB1 in neurons but not in thymocytes, for instance.  This is included in Discussion in the revised manuscript.

      Regarding the reviewer's second point about showing more extended domains for 4C interactions, we would like to highlight that Figure 5—supplement figure 3 in our submitted manuscript addresses this concern. This figure shows that BUR-interactions extend to multiple gene-rich regions across intervening gene-poor regions. Interestingly, BUR-1 and BUR-2 interactions skip a transcriptionally silent gene-rich region containing olfactory receptor genes but interact with subsequent gene-rich regions containing active genes. These data demonstrate that BUR-interactions do indeed traverse A- and B-compartment boundaries.  In the revised manuscript (in Figure 5—supplement figure 3), we newly added a Lamin B1-DamID (thymocyte) track.  Comparing with LADs, BUR-1 interactions occur mostly in non-LAD regions. Some minor overlap with LADs was detected in high resolution views (not shown). Future experiments testing BUR viewpoints that reside within LADs are required to assess whether SATB1 mediates interactions between B and A compartments.

      (6) The description of the urea-co-immunoprecipitation experiment (Figure 3C) could be improved to make it unequivocally clear that co-binding to chromatin is tested, not protein-protein interaction (which is destroyed by urea).

      Thank you for this helpful suggestion. We have revised the text in the manuscript by stating “Distinct from protein-protein co-immunoprecipitation (co-IP) using whole cell or nuclear extracts, we examined the direct co-binding status on chromatin in vivo of SATB1 and CTCF or cohesin by urea ChIP-Western”.

      Reviewer #2:

      (1) Since SATB1 has been described to interact with beta-catenin, I wonder if the authors have looked at TCF4/TCF7l2 binding patterns and their potential overlap with SATB1 binding patterns. This might appear a trivial request. However, uncontrolled WNT signalling is a major feature of cancer undergoing metastasis - a process that the authors have earlier associated with unscheduled SATB1 expression in triple-negative breast cancer.

      We thank the reviewer for highlighting this important point about the potential relationship between SATB1 and TCF4/TCF7l2 binding patterns. Based on published observations with other factors (Rad21, CTCF, BRG1, RUNX) that show substantial overlap with SATB1 in standard ChIP-seq peaks(Kakugawa et al., Cell Rep 19, 1176-1188 (2017). DOI: 10.1016/j.celrep.2017.04.038. Poterlowicz et al., PLoS Genet, 2017 DOI: 10.1371/journal.pgen.1006966), we would anticipate that TCF4 might also show significant overlap with SATB1. An important question is whether the DNA binding profile of TCF4 depends on SATB1.

      We have not yet generated ChIP-seq data for TCF4 in the presence and absence of SATB1, but we agree that such experiments could provide important insights into cancer progression as well as brain function. This represents an interesting direction for future work. We have added this point in our discussion based on your kind suggestion.

      (2) The CTCF sizes indicated in the western blot analyses of Figures 3C and Figure 3 - supplement figure 2 do not display the normal size, which is around 130 kDa. Either the issue is erroneous marking or a so-called salt effect to slow the migration in the gel. Alternatively, it reflects a slower migrating form of CTCF generated by for example PARylation (by PARP1) that is known to approach 180 kDa. It would be useful if the authors could clarify this minor issue.

      We appreciate the reviewer pointing out this discrepancy. As the reviewer correctly noted, CTCF can appear at a higher molecular weight due to post-translational modifications such as PARylation and O-GlcNAcylation, which alter its migration during electrophoresis.

      Upon re-examination of our raw data for Figure 3—supplement figure 2A, we discovered that the marker lane for the CTCF panel was broken, and the 150kDa band was erroneously assigned. This led to the 150kDa marker being placed below the CTCF migration position, which is clearly an error. We thank the reviewer for bringing this to our attention.

      We have checked our other data and consistently observe CTCF migrating below the 150kDa band, similar to the pattern shown on the Abcam website for the antibody we used (ab128873) (Figure 2). For Figure 3-supplement figure 2, we will use a marker lane from a parallel gel with identical composition and run time to correctly indicate the molecular weight. We havealso corrected the marker position in Figure 3C.

      Reviewing Editor (Recommendations for the authors):

      (1) The introduction states that urea ChIP-seq is "unbiased", which is difficult to unambiguously determine and therefore might be an overstatement. Maybe the authors could consider rephrasing.

      We agree with the reviewer's assessment and have rephrased our description of the urea ChIP-seq method to avoid using the term "unbiased."

      (2) The authors propose that in standard ChIP, most SATB1 is in the insoluble fraction. This seems easy to test and demonstrating it may help to further clarify the differences between the protocols.

      We appreciate this suggestion and would like to clarify our description. What we stated in the manuscript was:

      "We envision that SATB1 bound to inaccessible nuclear regions may be lost in the insoluble fraction."

      This refers specifically to a subpopulation of SATB1 that is bound to the high-salt extraction-resistant nuclear substructure, not to the total SATB1 protein. We also noted elsewhere in the manuscript that:

      "SATB1 proteins are found in high salt-resistant fraction as well as salt-extracted fraction (40). Thus, it is possible that soluble SATB1 may associate with open chromatin."

      Our unpublished results show that SATB1 proteins exist in at least two distinct forms based on protein mobility: SATB1 with high mobility and another with very low or no mobility. While we have identified the SATB1 domain responsible for each of these distinct mobility patterns, we have not yet identified biochemical differences that would allow us to distinguish them conclusively. Therefore, an experiment to test the distribution of SATB1 in soluble versus insoluble fractions would show SATB1 in both fractions but would not necessarily provide information about the functional significance of these different populations. We believe this is an important area for future research and are working to develop tools to specifically distinguish and characterize SATB1 in the soluble versus insoluble fractions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.

      First of all, we would like to thank the reviewer for their comprehensive feedback, and their insightful comments. Importantly, as you point out, our goal with this model was to build a minimal model of place cell representations, where representations were encouraged to be place-like, but free to vary in tuning and firing locations. By doing so, we could explore what upstream representations facilitate place-like representations, and even remapping (as it turned out) with minimal assumptions. However, we agree that our task does not capture some of the nuances of real-world navigation, such as sensory observations, which could be useful extensions in future work. Then again, the simplicity of our setup makes it easier to interpret the model, and makes it all the more surprising that it learns many behaviors exhibited by real world place cells.

      As to the distribution of phases - we also agree that a hexagonal arrangement likely reflects some optimal configuration for decoding of location.

      And we agree that the symmetry within the experimental data is important; we have revised analyses on experimental phase distributions, and included an analysis of ensemble grid score, to quantify any hexagonal symmetries within the data.

      Weaknesses:

      The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.

      Thank you for raising this point; we absolutely agree that the navigation task is somewhat niche. However, this was a conscious decision, to minimize any possible confounding from alternate input sources, such as observations. In part, this experimental design was inspired by the suggestion that grid cells support navigation/path integration in open-field environments with minimal sensory input (as they could, conceivably do so with no external input). This also pertains to your other point, that boundary interactions are necessary for navigation. In our model, using boundaries is one solution, but there is another way around this problem, which is conceivably better: to path integrate in an egocentric frame, starting from your initial position. Since the locations of place fields are inferred only after a trajectory has been traversed, the network is free to create a new or shifted representation every time, independently of the arena. In this case, one might have expected generalized solutions, such as grid cells to emerge. That this is not the case, seems to suggest that grid cells may somehow not be optimal for pure path integration, or at the very least, hard to learn (but may still play a part, as alluded to by place field locations). We have tried to make these points more evident in the revised manuscript.

      As for the point that the decoding may lead to place-like representations, this is a fair point. Indeed, we did choose this form of decoding, inspired by the localized firing of place cells, in the hope that it would encourage minimally constrained, place-like solutions. However, compared to other works (Sorscher and Xu) hand tuning the functional form of their place cells, our (although biased towards centralized tuning curves) allows for flexible functional forms such as the position of the place cell centers, their tuning width, whether or not it is center-surround activity, and how they should tune to different environments/rooms. This allows us to study several features of the place cell system, such as remapping and field formation. We have revised to make this more clear in the model description.

      The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.

      We agree that boundary interactions could facilitate the convergence of representations after noise injection. We did try to moderate this claim by the wording “attractor-like”, but we agree that boundaries could confound this result. We have therefore performed a modified noise injection experiment, where we let the network run for an extended period of time, before noise injection (and no velocity signal), see Appendix Velocity Ablation in the revised text. Notably, representations converge to their pre-scrambled state after noise injection, even without a velocity signal. However, place-like representations do not converge for all noise levels in this case, possibly indicating that boundary interactions do serve an error-correcting function, also. Thank you for pointing this out.

      As for the attractiveness of contexts, we agree that more analyses were required to demonstrate this. We have therefore conducted a supplementary analysis where we run the trained network with a mismatch in context/geometry, and demonstrate that the context signal fixes the representation, up to geometric distortions.

      The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.

      Thank you for pointing this out; we agree that this is not sufficiently explained and explored in the current version. We have therefore conducted a grid score analysis of the experimental place center distributions, to uncover possible hexagonal symmetries. The reason for choosing this particular animal was in part because it featured the largest number of included cells, while also demonstrating the most striking phase distribution, while including all distributions in the supplementary. Originally, this was only intended as a preliminary analysis, suggesting non-uniformity in experimental place field distributions, but we realize that these may all provide interesting insight into the distributional properties of place cells.

      We have explained these choices in the revised text, and expanded analyses on all animals to showcase these results more clearly.

      Reviewer #2 (Public Review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entorhinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.

      The suggestion about the remapping mechanism shows an interesting theoretical possibility.

      We thank the reviewer for their kind feedback.

      Weaknesses:

      The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.

      The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.

      We appreciate this point, and hope to clarify: From a purely architectural perspective, place-like representations are generated by linear combinations of recurrent unit representations, which, after training, appear border-like. During remapping, the network is simply evaluated/run in different geometries/contexts, which, it turns out, causes the network to exhibit different representations, likely as solutions to optimally encoding position in the different environments. We have attempted to revise the text to make some of these interpretations more clear. We have also conducted a supplementary analysis to demonstrate how representations are determined by the context signal directly, which helps to explain how recurrent and output units form their representations.

      We also agree that our model does not capture the full complexity of the Hippocampal formation. However, we would argue that its simplicity (focusing on a single cell type and a pure path integration task), acts as a useful baseline for studying the role of place cells during spatial navigation. The fact that our model captures a range of place cell behaviors (field formation, remapping and geometric deformation) without grid cells also point to several interesting possibilities, such that grid cells may not be strictly necessary for place cell formation and remapping, or that border cells may account for many of the peculiar behaviors of place cells. However, we wholeheartedly agree that including e.g. sensory information and memory storage/retrieval tasks would prove a very interesting extension of our model to more naturalistic tasks and settings. In fact, our framework could easily accommodate this, e.g. by decoding contexts/observations/memories from the network state, alongside location.

      Reviewer #3 (Public Review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking, and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Thank you for your insightful comments! Regarding the low path integration error, there is a slight statistical signal from the boundaries, as trajectories tend to turn away from arena boundaries. However, we agree, that studying path integration performance in the face of noise would make for a very interesting future development.

      Weaknesses:

      I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.

      First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.

      Thank you for pointing this out. This is an excellent point, that we agree could be addressed more rigorously. Note that there is no position encoding in our model; the initial state of the network is a vector of zeros, and the network must infer its location from boundary interactions and context information alone. So there is no way for positional information to leak through to the recurrent layer directly. However, one possible reason for the observed symmetry breaking, is the fact that the velocity input signal is aligned with the cardinal directions. To investigate this, we trained a new model, wherein input velocities are rotated 45 degrees relative to the horizontal, as you suggest. The results, shown and discussed in appendix E (Learned recurrent representations align with environment boundaries), do indicate that representations are tuned to environment boundaries, and not the cardinal directions, which hopefully improves upon this point.

      Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.

      This is a great point, and we agree. We do write that we perform this encoding to encourage minimally constrained place-like representations (to study their properties), but we have revised to make this more evident.

      However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the code, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?

      Thanks for raising this point! We agree that this finding is surprising, but we hold that it actually shows something quite important: that border-type units are sufficient to create place-like representations, and learns several of the behaviors associated with place cells and remapping (including global remapping and field stretching). In other words, a single cell type known to exist upstream of place cells is sufficient to explain a surprising range of phenomena, demonstrating that other cell types are not strictly necessary. However, we agree that understanding why the boundary type units sometimes rate remap, and whether that can be true for some border type cells in the brain (either directly, or through gating mechanisms) would be important future developments. Related to this point, we also expanded upon the influence of the context signal for representation selection (appendix F)

      Concerning the relationship to other models, we would argue that the simplicity of our model is one of its core strengths, making it possible to disentangle what different cell types are doing. While other models, including TEM, are highly important for understanding how different cell types and brain regions interact to solve complex problems, we believe there is a need for minimal, understandable models that allows us to investigate what each cell type is doing, and this is where we believe our work is important. As an example, our model not only highlights the sufficiency of boundary-type cells as generators of place cells, its lack of e.g. grid cells also suggest that grid cells may not be strictly necessary for e.g. open-field/sensory-deprived navigation, as is often claimed.

      One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!

      This is a good point, and you are correct: one cluster tends to correspond to one unit. To make this more clear, we have revised Fig. 7, so that each decoded center is shaded by unit identity, which makes this more evident. And yes, this is, seemingly in line with some form of optimal packing/encoding of space, yes!

      I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.

      Thanks for raising this point. We agree that we were not clear enough in our original manuscript. We included additional analyses in one animal, to showcase one preliminary case of non-uniform phases. To mitigate this, we have performed the same analyses for all animals, and included a longer discussion of these results (included in the supplementary material). We have also moderated the discussion on Ripley’s H to encompass only non-uniformity, and added a grid score analysis to showcase possible rotational symmetries in the data. We hope this gets our findings across more clearly

      Some smaller weaknesses:

      - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.

      Longer training time did not seem to affect representations. However, due to the long trajectories and statefulness involved, training was time-intensive and could become unstable for very long training. We therefore stopped training at the indicated time.

      - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?

      This is a good point; stability is not guaranteed. We have updated the text to reflect this.

      - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?

      We found that bias tended to have a detrimental effect on training, possibly related to the identity initialization used (see e.g. Le et al. 2015), and found that training improved when biases were fixed to zero.

      - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.

      We agree that other works also provide ways of formalizing this concepts. However, our goal by doing so was to elucidate common features across these seemingly disparate models. We also found that the concept of a learned and target map made it easier to come up with novel models, such as one wherein place cells are constructed to match a grid cell label.

      Aim Achieved? Impact/Utility/Context of Work

      Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.

      Thank you for this perspective; we agree that all of these are important works that arrive at complementary findings. We hold that the importance of our paper lies in its minimal nature, and its focus on place cells, via a purpose-built decoding that enables place-like representations. In doing so, we can point to possibly under explored relationships between cell types, in particular place cells and border cells, while challenging the necessity of other cell types for open-field navigation (i.e. grid cells). In addition, our work points to a novel connection between grid cells, place cells and even border cells, by way of the hexagonal arrangement of place unit centers. However, we agree that expanding our model to include more biologically plausible architectures and constraints would make for a very interesting extension in the future.

      Thank you again for your time, as well as insightful comments.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Even after reading Methods 5.3, I found it hard to understand how the ratemap population vectors that produce Fig 3e and Fig 5 are calculated. It's unclear to me how there can be a ratemap at a single timestep, because calculating a ratemap involves averaging the activity in each location, which would take a whole trajectory and not a single timestep. But I think I've understood from Methods 5.1 that instead the ratemap is calculated by running multiple 'simultaneous' trajectories, so that there are many visited locations at each timestep. That's a bit confusing because as far as I know it's not a common way to calculate ratemaps in rodent experiments (probably because it would be hard to repeat the same task 500 times, while the representations remain the same), so it might be worth explaining more in Methods 5.3.

      We understand the confusion, and have attempted to make this more clear in the revised manuscript. We did indeed create ratemaps over many trajectories for time-dependent plots, for the reasons you mentioned. We also agree that this would be difficult to do experimentally, but found it an interesting way to observe convergence of representations in our simulated scenario.

      Fig 3b-d shows multiple analyses to support output unit global remapping, but no analysis to support the claim that recurrent units remap by rate changes. The examples in Fig 3ai look pretty convincing, but it would be useful to also have a more quantitative result.

      We agree, and only showed that units turn off/become silent using ratemaps. We have therefore added an explicit analysis, showcasing rate remapping in recurrent units (see appendix G; Recurrent units rate remap)

      Reviewer #2 (Recommendations For The Authors):

      Some parts of the current manuscript are hard to follow. Particularly, the model description is not transparent enough. See below for the details.

      Major comments:

      (1) Mathematical models should be explained more explicitly and carefully. I had to guess or desperately search for the definitions of parameters. For instance, define the loss function L in eq.(1). Though I can assume L represents the least square error (in A.8), I could not find the definition in Model & Objective. N should also be defined explicitly in equation (3). Is this the number of output cells?

      Thank you for pointing this out, we have revised to make it more clear.

      (2) In Fig. 1d, how were the velocity and context inputs given to individual neurons in the network? The information may be described in the Methods, but I could not identify it.

      This was described in the methods section (Neural Network Architecture and Training), but we realize that we used confusing notation, when comparing with Fig. 1d. We have therefore changed the notation, and it should hopefully be clearer now. Thanks for pointing out this discrepancy.

      (3) I took a while to understand equations (3) and (4) (for instance, t is not defined here). The manuscript would be easier to read if equations (5) and (6) are explained in the main text but not on page 18 (indeed, these equations are just copies of equations 3 and 4). Otherwise, the authors may replace equations (3) and (4) with verbal explanations similar to figure legend for Fig. 1b.

      (4) Is there any experimental evidence for uniformly strong EC-to-CA1 projections assumed in the non-trainable decoder? This point should be briefly mentioned.

      Thank you for raising this point. The decoding from EC (the RNN) to CA1 (the output layer) consists of a trainable weight matrix, and may thus be non-uniform in magnitude. The non-trainable decoding acts on the resulting “CA1” representation only. We hope that improvements to the model description also makes this more evident.  

      (5) The explanation of Fig. 3 in the main text is difficult to follow because subpanels are explained in separate paragraphs, some of which are very short, as short as just a few lines.

      This presentation style makes it difficult to follow the logical relationships between the subpanels. This writing style is obeyed throughout the manuscript but is not popular in neuroscience.

      Thanks for pointing this out, we have revised to accommodate this.

      (6) Why do field centers cluster near boundaries? No underlying mechanisms are discussed in the manuscript.

      This is a good point; we have added a note on this; it likely reflects the border tuning of upstream units.

      (7) In Fig. 4, the authors presented how cognitive maps may vary when the shape and size of open arenas are modified. The results would be more interesting if the authors explained the remapping mechanism. For instance, on page 8, the authors mentioned that output units exhibit global remapping between contexts, whereas recurrent units mainly rate remapping.

      Why do such representational differences emerge?

      We agree! Thanks for raising this point. We have therefore expanded upon this discussion in section 2.4.

      (8) In the first paragraph of page 10, the authors stated ".. some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row)". I could not understand how Fig. 4d, middle row supports the argument. Similarly, they stated "..some output units reflect their main boundary input (with greater activity near one boundary)." I can neither understand what the authors mean to say nor which figures support the statement. Please clarify.

      This is a good point, there was an identifier missing; we have updated to refer to the correct “magnification”. Thanks!

      (9) The underlying mechanism of generating the hexagonal representation of output cells remains unclear. The decoder network uses a non-trainable decoding scheme based on localized firing patterns of output units. To what extent does the hexagonal representation depend on the particular decoding scheme? Similarly, how does the emergence of the hexagonal representation rely on the border representation in the upstream recurrent network? Showing several snapshots of the two place representations during learning may answer these questions.

      This is an interesting point, and we have added some discussion on this matter. In particular, we speculate whether it’s an optimal configuration for position reconstruction, which is demanded by the task and thus highly likely dependent on the decoding scheme. We have not reached a conclusive method to determine the explicit dependence of the hexagonal arrangement on the choice of decoding scheme. Still, it seems this would require comparison with other schemes. In our framework, this would require changing the fundamental operation of the model, which we leave as inspiration for future work. We have also added additional discussion concerning the relationship between place units, border units, and remapping in our model. As for exploring different training snapshots, the model is randomly initialized, which suggests that earlier training steps should tend to reveal unorganized/uninformative phase arrangements, as phases are learned as a way of optimizing position reconstruction. However, we do call for more analysis of experimental data to determine whether this is true in animals, which would strongly support this observation. We also hope that our work inspires other models studying the formation and remapping of place cells, which could serve as a starting point for answering this question in the future.

      (10) Figure 7 requires a title including the word "hexagonal" to make it easier to find the results demonstrating the hexagonal representations. In addition, please clarify which networks, p or g, gave the results shown here.

      We agree, and have added it!

      Minor comments:

      (11) In many paragraphs, conclusions appear near their ends. Stating the conclusion at the beginning of each paragraph whenever possible will improve the readability.

      We have made several rewrites to the manuscript, and hope this improves readability.

      (12) Figure A4 is important as it shows evidence of the CA1 spatial representation predicted by the model. However, I could not find where the figure is cited in the manuscript. The authors can consider showing this figure in the main text.

      We agree, and we have added more references to the experimental data analyses in the main text, as well as expanded this analysis.

      (13) The main text cites figures in the following format: "... rate mapping of Fig. 3a), i), boundary ...." The parentheses make reading difficult.

      We have removed the overly stringent use of double parentheses, thanks for letting us know.

      (14) It would be nice if the authors briefly explained the concept of Ripley's H function on page 14.

      Yes, we have added a brief descriptor.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Review 1:

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      We agree with the reviewers that the whole brain imaging approach is both a strength and a weakness. This manuscript and our previously published paper (Hotz et al., 2022) show indeed that the seizures have a initiation point and spread throughout the brain, interestingly affecting the telencephalon last. Localized seizure initiation was not the scope of this manuscript, however also here we would have to rely on imaging techniques. Using cell type specific drivers for specific neuronal subpopulation are an interesting approach, but outside of the scope of this study. An interesting approach would also include a more detailed analysis of glia in the context of epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      We also agree, that a more regional approach, after having more reliable information on the expression domains of the different galanin receptors, including more information on their respective role, is an important future research direction.

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We are in the process of preparing a manuscript describing a more detailed gene expression study of this and a chemically induced seizure model. Surprisingly we did not observe strong effects on glutamate receptor related genes. This does not preclude and indeed we deem it likely that additional factors play a role, e.g. other neuropeptides.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason to the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      Yes, we agree that galanin is likely not the only player. This warrants further investigations.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Review 2:

      Previous concerns about sex or developmental biological variables were addressed, as their model's seizure phenotype emerges rapidly and long prior to the establishment of zebrafish sexual maturity. However, in the course of re-review, some additional concerns (below) were detected that, if addressed, could further improve the manuscript. These concerns relate to how seizures were defined from the measurement of fluorescent calcium imaging data. Overall, this study is important and convincing, and carries clear value for understanding the multifaceted functions that neuronal galanin can perform under homeostatic and disease conditions.

      We are pleased that we could dispel the initial concerns.

      Additional Concerns:

      - The authors have validated their ability to measure behavioral seizures quantitatively in their 2022 Glia paper but the information provided on defining behavioral seizures was limited. The definition of behavioral seizure activity is not expanded upon in this paper, but could provide detail about how the behavioral seizures relate to a seizure detected via calcium imaging.

      In this paper we indeed do not address behavioral seizures but focus completely on neuronal seizures as defined in the material and methods section (“seizures were defined as calcium fluctuations reaching at least 100% of ΔF/F0 in the whole brain.”). Epileptic seizures in zebrafish, either evoked by pharmacological means or the result of genetic mutations, evoke stereotyped locomotor behavior in zebrafish as described in multiple publications (e.g. Baraban et al., 2005, Berghmans et al., 2007, Baxendale et al., 2012 and references therein).

      - Related to the previous point, for the calcium imaging, the difference between an increase in fluorescence that the authors think reflects increased neuronal activity and the fluorescence that corresponds to seizures is not very clear. This detail is necessary because exactly when the term "seizure" describes a degree of increased activity can be difficult to distinguish objectively.

      In our material and methods section, we describe our working definition of a seizure. Seizures are easily distinguished from increased activity by being synchronized.

      - The supplementary movies that were added were very useful, but raised some questions. For example, what brain regions were pulsating? What areas seemed to constantly exhibit strong fluorescence and was this an artifact? It seemed that sometimes there was background fluorescence in the body. Perhaps an anatomical diagram could be provided for the readers. In addition, there were some movies with much greater fluorescence changes - are these the seizures? These are some reasons for our request for clarified definitions of the term "seizure".

      The ”pulsating” (or “flickering”) brain activity is spontaneous neuronal activity. Some areas may appear to be more active, probably by a denser packing of neurons and intrinsically more spontaneous neuronal activity. However, since we only use normalized data, this does not affect our measurements.

      - While it is not critical to change, I will again note the possible confusion that the use of the word "sedative" in this context may cause. However, I do understand this is a stylistic choice.

      - Supplementary Figure 1B: the N values along the x-axis appear to have been duplicated and the duplications are offset and overlapping with one another by mistake.

      Thank you for pointing this out. We have corrected the figure accordingly.

      Review 3:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the revised manuscript still lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We agree that the mechanistic role of galanin still needs to be defined. The role is more complex that we expected, mainly due to its negative feedback properties. A complete mechanistic understanding will require a number of additional studies and is unfortunately outside of the scope of this manuscript.

      (2) The revised manuscript continues to heavily rely on calcium imaging of different mutant lines. Confirmation of knockouts has been provided with immunostaining in a new supplementary figure. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Cell recordings and biochemistry is challenging in the small larval zebrafish brain. We deem the genetic manipulations that we describe to be more informative than pharmacological experiments due to specificity issues.

    1. Author response

      eLife Assessment

      The authors investigated KLF Transcription Factor 16 (KLF16) as an inhibitor of osteogenic differentiation, which plays a critical role in bone development, metabolism and repair. The results of the study are valuable as they could help to facilitate future research on the regulation of osteogenesis in vitro and in vivo. However, the evidence overall is incomplete, as validation by knockout mouse models would help to strengthen the conclusions.

      We appreciate the editors’ evaluation and recognition of the importance of our research. The primary goal and value of our study is to provide robust bioinformatics analyses of 20 independent iPSC lines, which can lead to the identification of novel genes involved in osteogenic differentiation. The identification of KLF16 serves to illustrate this goal. A thorough analysis of the function of any single gene both in vitro and in vivo is beyond the initial scope of this study. To validate KLF16’s inhibitory role in osteogenic differentiation, we provided evidence showing overexpression of Klf16 suppressed osteogenic differentiation in vitro, and Klf16<sup>+/-</sup> mice exhibited enhanced bone mineral content and density in vivo.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ru and colleagues investigated regulatory gene interactions during osteogenic differentiation. By profiling transcriptomic changes during mesenchymal stem cell differentiation, they identified KLF16 as a key transcription factor that inhibits osteogenic differentiation and mineralization. It was found that overexpression of KLF16 suppressed osteogenesis in vitro, while Klf16<sup>+/-</sup> mice exhibited enhanced bone density, underscoring its regulatory role in bone formation.

      Strengths:

      (1) Bioinformatics is strong and comprehensive.

      (2) Identification of KLF16 in osteoblast differentiation is exciting and innovative.

      We appreciate the reviewer’s comments on our bioinformatic analyses of MSC osteogenic differentiation and the identification of KLF16 as a new osteogenesis regulator. The differentiation of iPSC-derived MSCs to OBs serves as a valuable model for investigating gene expression and regulatory networks in osteogenic differentiation. This study provides insights into the complex and dynamic regulation of the transcriptomic landscape in osteogenic differentiation and supplies a foundational resource for additional investigation into normal bone formation and the mechanisms underlying pathological conditions.

      Weaknesses:

      (1) The mechanism of KLF16 function is not studied.

      (2) Studies of KLF16 in bone development, from both in vitro and in vivo perspectives, are descriptive.

      Our study aims to apply rigorous bioinformatic analyses of 20 iPSC lines to identify novel genes involved in osteogenic differentiation. With this strategy, we successfully identified KLF16 as a regulator of osteogenic differentiation. We validated this with both in vitro and in vivo models even though we had limited availability of Klf16 knockout mice when the study was conducted. We demonstrated that overexpression of Klf16 suppressed osteogenesis in vitro, while Klf16<sup>+/-</sup> mice exhibited increased bone mineral density, trabecular number, and cortical bone area, highlighting its role in bone formation. With these mice now available, further investigation into the mechanism of KLF16's function is possible.

      (3) Findings in bioinformatics analysis are mostly redundant with previous studies in the field, and can be simplified.

      We compared our bulk RNA-seq data with our previously published single-cell RNA-seq (scRNA-seq) data generated from iPSC-induced cells during osteogenic differentiation (Housman et al., 2022). The purpose is to corroborate the expression patterns of the genes we focused on during osteogenic differentiation. We found similar differential expression patterns in a pseudobulk analysis of the scRNA-seq data, even though there are significant differences between these two studies, including: cell culture conditions, sequencing approaches (bulk vs. single cell), goals of the studies (key TF drivers of osteoblast differentiation vs. mapping differentiation stages and inter-species gene programs in human and chimp), and findings (identification of TFs vs. identification of interspecific regulatory differences) .

      Importantly, we performed network analyses to identify key transcription factors, which were not redundant with previous studies. We constructed a transcription factor regulatory network analysis during human osteogenic differentiation, and identified a network organized into five interactive modules. The most exciting finding was the identification of KLF16 as one of the strongest regulators in Module 5 (Figure 3), which previously was not demonstrated to be involved in bone formation. We also demonstrated known TF genes regulating osteogenic differentiation in these modules, and performed gene ontology (GO) and reactome pathway (RP) analyses for regulatory functions and pathways specific to each module. To clarify that our findings do not overlap with previous studies, we will revise the manuscript focusing on Module 5 and simplify the description of the bioinformatics analysis as the reviewer suggested.

      Reviewer #2 (Public review):

      In their manuscript with the title "Integrated transcriptomic analysis of human induced pluripotent stem cell (iPSC)-derived osteogenic differentiation reveals a regulatory role of KLF16", Ru et al. have analyzed the gene expression changes during the osteogenic differentiation of iPSC-derived mesenchymal stem/stromal cells into preosteoblasts and osteoblasts. As part of the computational analyses, they have investigated the transcription factor regulatory network mediating this differentiation process, which has also led to the identification of the transcription factor KLF16. Overexpression experiments in vitro and the analysis of heterozygous KLF16 knockout mice in vivo indicate that KLF16 is an inhibitor of osteogenic differentiation.

      The integrated analysis of iPSC bulk transcriptomic data is a major strength of the study, and it is also great that the authors provide deeper functional characterization of the transcription factor KLF16, one of the newly identified candidate regulators of osteogenic differentiation.

      We appreciate the reviewer’s summary and comments on the strength of our bioinformatic analyses of iPSC/MSC osteogenic differentiation and the deep functional characterization of the KLF16, as well as the novelty of our findings.

      However, characterization of KLF16 expression in the mouse and validation of the knockout model are currently lacking. Alternative explanations for the mutant phenotype should be considered to improve the strength of the conclusions.

      If all issues can be addressed, the study would provide an important resource for the field that would facilitate future research on the regulation of osteogenesis in vitro and in vivo, with potential implications for preclinical and clinical research as well as bioengineering.

      We appreciate the reviewer’s valuable suggestions. Klf16 is highly expressed in mandibular, maxillary and tail mesenchyme at embryonic Day 12 (D'Souza et al., 2002), indicating its role in early bone development. We will further characterize the expression of Klf16 in mice, especially in the developing bones.

      We identified Klf16 as a potential regulator of osteogenic differentiation, and then validated this with both in vitro and in vivo models. Overexpression of Klf16 suppressed osteogenesis in vitro, and Klf16<sup>+/-</sup> mice showed increased bone mineral content and density, indicating its regulatory role in bone formation. We agree with the reviewer that the bone phenotypes of Klf16 knockout mice potentially can be affected by other factors in addition to osteogenic differentiation. As both bone formation and resorption are critical for bone development, we evaluated osteoclastogenesis in the Klf16<sup>+/-</sup> mice by analyzing the expression of osteoclast marker CALCR and regulator RANKL in the femurs of the Klf16<sup>+/-</sup> mice. Neither CALCR nor RANKL decreased in the bone of Klf16<sup>+/-</sup> mice, indicating that osteoclastogenesis is not decreased; therefore, increased bone mineral content and density in the mutant mice is more likely attributed to enhanced bone formation rather than reduced resorption by osteoclasts. Additionally, we will discuss other alternative explanations for the bone phenotypes of Klf16 knockout mice as suggested by the reviewer.

      References

      D'Souza, U. M., Lammers, C.-H., Hwang, C. K., Yajima, S. and Mouradian, M. M. (2002). Developmental expression of the zinc finger transcription factor DRRF (dopamine receptor regulating factor). Mechanisms of Development 110, 197-201.

      Housman, G., Briscoe, E. and Gilad, Y. (2022). Evolutionary insights into primate skeletal gene regulation using a comparative cell culture model. PLOS Genetics 18, e1010073-e1010073.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank all the reviewers for their time and valuable feedback, which helped us improve our manuscript. Based on the comments, we have made several critical changes to the revised manuscript.

      (1) We have changed our threshold for detecting freezing epochs from 1 cm/s to 0 cm/s in this revised manuscript. This change allows us to capture periods when animals are completely still on the treadmill, better matching the "true freezing" behavior seen in freely moving set-ups. We have added a new supplementary video (Supplementary Video 2) that better demonstrates the freezing response we observe. All results and figures in the revised manuscript reflect this updated threshold (Figure 2-6, Supplementary Figures 16, Tables 1-6). Our main findings remain robust, demonstrating that freezing serves as a reliable conditioned response in our paradigms, comparable to freely moving animals. Specifically, freezing behavior increased reliably in the fear-conditioned environment following CFC across all paradigms. We have also added data from a no-shock control group (Supplementary Figure 2) which, when compared to the conditioned group, shows that freezing responses in the conditioned group result from fear conditioning rather than immobility. We do observe other avoidance behaviors unique to our treadmill-based task— such as hesitation, backward movement, and slow crawls. These conditioned behaviors are captured through a separate metric: the time taken to complete a lap.

      (2) As suggested by the reviewers, we have separately analyzed fear discrimination and extinction dynamics across recall days (Supplementary Figures 2, 5 and 6, Table 1-6). To assess fear discrimination, we use within-group comparisons to evaluate how well animals differentiate between the two VRs across days. For extinction, we use within-VR comparisons to examine freezing dynamics over time. Freezing across recall days is compared to baseline freezing (pre-conditioning) using a Linear Mixed Effects model (Tables 1-6), with recall days as fixed effects and mouse as a random effect, using baseline freezing as the reference.

      (3) We have expanded the behavioral dataset in Paradigm 1 to investigate the effect of shock amplitude on the conditioned fear response (Supplementary Figure 2 C-E). Consistent with findings in freely moving animals, our data show that increasing shock intensity from 0.6 mA to 1.0 mA leads to stronger freezing. For the revised manuscript, we specifically increased the sample size in the 0.6 mA group (n = 8) in Paradigm 1, as this intensity is used in Paradigm 3. These additional data demonstrate that combining a lower shock amplitude with shorter inter-shock intervals and retaining the tail-coat during recall can enhance freezing, suggesting that these parameters help compensate for lower shock intensity.

      (4) We have added more sample sizes to the imaging dataset (now n = 8, Figures 7-8).

      Finally, we acknowledge that many aspects of this paradigm still require optimization. The headfixed CFC paradigm is in its early stages compared to the decades of research dedicated to understanding fear learning parameters in freely moving CFC paradigms. While there are numerous parameters that could be tested—both those identified through our own discussions and those raised by the reviewers—it is not feasible for a single lab to conduct a full evaluation of all the possible factors that could influence CFC in the head-fixed prep. A key limitation is that our approach requires robust navigation behavior in the VR without rewards, which requires weeks of training per mouse. It also necessitates larger sample sizes at the outset as not all animals will make it through our behavioral criteria required for CFC. Another important consideration is scalability. Unlike freely moving CFC paradigms, which allow parallel testing of many animals with minimal pre-training, the VR-CFC setup requires several weeks of behavior training and involves a more complex integration of hardware and software to accurately track behavior in virtual space. The number of VR rigs that can be operated simultaneously in a single lab is often limited, making high-throughput testing more challenging. These factors mean that the testing of a single parameter in a group of animals requires approximately 3–4 months to complete. Despite these constraints, we are committed to continue refining this paradigm over time. With this manuscript, our main aim was to provide a detailed framework, initial parameters, and evidence for conditioned behavior in the head-fixed preparation. By doing so, we hope to facilitate the adoption of this paradigm by researchers interested in studying the neural correlates of learning and memory using multiphoton imaging and stimulation techniques. This approach enables investigations that are not possible in freely moving animals, while the presence of freezing as a conditioned response allows for direct comparisons to the extensive body of work done in freely moving paradigms. Moving forward, we anticipate that optimizing this paradigm and identifying the key parameters that drive learning will be a collaborative, community-led effort.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to develop a contextual fear learning (CFC) paradigm in head-fixed mice that would produce freezing as the conditioned response. Typically, lick suppression is the conditioned response in such designs, but this (1) introduces a potential confounding influence of reward learning on neural assessments of aversion learning and (2) does not easily allow comparison of head-fixed studies with extensive previous work in freely moving animals, which use freezing as the primary conditioned response.

      The first part of this study is a report on the development and outcomes of 3 variations of the CFC paradigm in a virtual reality environment. The fundamental design is strong, with headfixed mice required to run down a linear virtual track to obtain a water reward. Once trained, the water reward is no longer necessary and mice will navigate virtual reality environments. There are rigorous performance criteria to ensure that mice that make it to the experimental stage show very low levels of inactivity prior to fear conditioning. These criteria do result in only 40% of the mice making it to the experimental stage, but high rates of activity in the VR environment are crucial for detecting learning-related freezing. It is possible that further adjustments to the procedure could improve attrition rates.

      We acknowledge that further adjustments to the procedure could improve attrition rates, and we will continue to work on improving the paradigm.

      Paradigm versions 1 and 2 vary the familiarity of the control context while paradigm versions 2 and 3 vary the inter-shock interval. Paradigm version 1 is the most promising, showing the greatest increase in conditioned freezing (~40%) and good discrimination between contexts (delta ~15-20%). Paradigm version 2 showed no clear evidence of learning - average freezing at recall day 1 was not different than pre-shock freezing. First-lap freezing showed a difference, but this single-lap effect is not useful for many of the neural circuit questions for which this paradigm is meant to facilitate. Also, the claim that mice extinguished first-lap freezing after 1 day is weak. Extinction is determined here by the loss of context discrimination, but this was not strong to begin with. First-lap freezing does not appear to be different between Recall Day 1 and 2, but this analysis was not done.

      This is an important point. Following reviewer suggestions, we have replotted our figures for all paradigms to show within-VR freezing (see Supplementary Figures 2, 5 and 6) as the appropriate method for quantifying fear extinction across days. Using an LME model (Tables 16), we quantify freezing during recall days against baseline freezing levels measured before fear conditioning within each VR. In Paradigm 2, while some fear discrimination persists across days, extinction does occur rapidly. After the first lap in the CFC VR, we observed no significant differences in freezing compared to the baseline. These results are shown in the revised Supplementary Figure 5, and the revised text is in lines 393-399.

      Paradigm version 3 has some promise, but the magnitude of the context discrimination is modest (~10% difference in freezing). Thus, further optimization of the VR CFC will be needed to achieve robust learning and extinction. This could include factors not thoroughly tested in this study, including context pre-exposure timing and duration and shock intensity and frequency.

      We acknowledge that many aspects of this paradigm still need optimization, as virtual reality CFC is in its early stages, and we have not explored all of the parameter space. We describe above the reasoning for this. However, for this revised version of the paper we have added new behavioral data (Supplementary Figure 2 C-E) showing that increasing shock intensities from 0.6 mA to 1 mA enhances freezing, both in the first lap and on average. There are of course many other parameters that are likely important, like the ones pointed out here by the reviewer, but exploring the entire parameter space will take many years and will likely require many labs. The purpose of this paper is to show that VR-CFC fundamentally works and is a starting point from which the field can build on. We have now pointed out in the introduction (lines 54-58) and discussion (lines 730-737, 810-814) that there remains significant scope for improving this paradigm and optimizing parameters in the future.

      The second part of the study is a validation of the head-fixed CFC VR protocol through the demonstration that fear conditioning leads to the remapping of dorsal CA1 place fields, similar to that observed in freely moving subjects. The results support this aim and largely replicate previous findings in freely moving subjects. One difference from previous work of note is that VR CFC led to the remapping of the control environment, not just the conditioning context. The authors present several possible explanations for this lack of specificity to the shock context, further underscoring the need for further refinement of the CFC protocol before it can be widely applied. While this experiment examined place cell remapping after fear conditioning, it did not attempt to link neural activity to the learned association or freezing behavior.

      This is an interesting observation. We think that the remapping observed in the control context likely occurred due to the absence of reward in a previously rewarded environment. Our prior work has demonstrated that removal of reward causes increased remapping (Krishnan et al., 2022, Krishnan and Sheffield, 2023). In other words, the continued presence of reward within an environment stabilizes CA1 place fields. The Moita et al. (2004) paper, which showed remapping only in the fear conditioned context and not in the control context, provided rats with food pellets throughout the experimental session in both the control and conditioned context— likely to increase exploration necessary for identifying place cells. The presence of reward in the Moita et al experiment could explain the minimal remapping observed in their control context compared to our control context which lacked reward. Another possibility could lie in the differences in the intervals between place cell activity recordings in our study and that of Moita et al. While Moita et al. separated their recordings by just one hour, our recordings were separated by a full day, with a sleep period in between. The absence of sleep and the shorter time interval between conditioning and retrieval sessions in their study could explain the minimal remapping observed by Moita et al. compared to our findings. We have now addressed this discrepancy explicitly in lines 596-606.

      Although we agree with the reviewer that it would be informative to perform analysis of how neural activity correlates with freezing responses, we think this warrants its own stand-alone manuscript as the neural dynamics and methods to appropriately analyze them are complicated. We are in the midst of analyzing this data further and will present these findings in a separate publication.

      In summary, this is an important study that sets the initial parameters and neuronal validation needed to establish a head-fixed CFC paradigm that produces freezing behaviors. In the discussion, the authors note the limitations of this study, suggest the next steps in refinement, and point to several future directions using this protocol to significantly advance our understanding of the neural circuits of threat-related learning and behavior.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Krishnan et al devised three paradigms to perform contextual fear conditioning in head-fixed mice. Each of the paradigms relied on head-fixed mice running on a treadmill through virtual reality arenas. The authors tested the validity of three versions of the paradigms by using various parameters. As described below, I think there are several issues with the way the paradigms are designed and how the data are interpreted. Moreover, as Paradigm 3 was published previously in a study by the same group, it is unclear to me what this manuscript offers beyond the validations of parameters used for the previous publication. Below, I list my concerns point-by-point, which I believe need to be addressed to strengthen the manuscript.

      Major comments

      (1) In the analysis using the LME model (Tables 1 and 2), I am left wondering why the mice had increased freezing across recall days as well as increased generalization (increased freezing to the familiar context, where shock was never delivered). Would the authors expect freezing to decrease across recall days, since repeated exposure to the shock context should drive some extinction? This is complicated by the analysis showing that freeing was increased only on retrieval day 1 when analyzing data from the first lap only. Since reward (e.g., motivation to run) is removed during the conditioning and retrieval tests, I wonder if what the authors are observing is related to decreased motivation to perform the task (mice will just sit, immobile, not necessarily freezing per se). I think that these aspects need to be teased out.

      This is an important point and we agree teasing out a lack of motivation versus fearful freezing would be useful. To address the possibility that reduced motivation to run without reward could contribute to the observed freezing behavior, we have now included a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I). These control mice experienced the same protocol, including the wearing of a tail coat, but did not receive any shocks. We observed no increases in freezing across days in these controls, confirming that the increased freezing in the Familiar context of our experimental group stems from fear conditioning rather than the removal of reward from a previously rewarded context. If reduced motivation from reward removal were the primary driver, similar freezing patterns would have emerged in the no-shock controls. We have added lines 248-261 in the revised manuscript, discussing this point, and we thank the reviewer for motivating us to do this experiment and analysis.

      That said, the precise mechanisms underlying the fear generalization observed in the nonconditioned context—particularly its emergence during later recall days—remain unclear. Studies in freely moving animals have shown that fear memories initially specific to the conditioned context can become generalized with repeated exposures, which may be occurring here (Biedenkapp & Rudy, 2007; Wiltgen & Silva, 2007). Alternatively, it is possible that the combination of fear conditioning and the removal of expected reward contributes to a delayed generalization effect. This may reflect a limitation of our approach, which relies on reward to motivate initial training. As noted by another reviewer, we have now addressed this potential drawback of reward-based training in the discussion (see lines 809-817). Clearly, unique factors specific to the head-fixed VR paradigm may contribute to this phenomenon. Understanding the mechanisms underlying fear generalization in the head-fixed VR CFC paradigm will be a valuable direction for future research.

      (2) Related to point 1, the authors actually point out that these changes could be due to the loss of the water reward. So, in line 304, is it appropriate to call this freezing? I think it will be very important for the authors to exactly define and delineate what they consider as freezing in this task, versus mice just simply sitting around, immobile, and taking a break from performing the task when they realize there is no reward at the end.

      As noted in point 1 above, we have added a no-shock control group (n = 7; Supplementary Figure 2A-B, H–I) to determine whether the observed freezing was driven by fear conditioning or by reduced motivation to run in the absence of reward. The absence of increased freezing in these controls supports the interpretation that the behavior in the conditioned group is fearrelated. In future studies, incorporating additional physiological measures—such as heart rate monitoring—could further help distinguish fear-related freezing from other forms of immobility.

      (3) In the second paradigm, mice are exposed to both novel and (at the time before conditioning) neutral environments just before fear conditioning. There is a big chance that the mice are 'linking' the memories (Cai et al 2016) of the two contexts such that there is no difference in freezing in the shock context compared to the neutral context, which is what the authors observe (Lines 333-335). The experiment should be repeated such that exposure to the contexts does not occur on the conditioning day.

      This is an interesting idea. However, if memory linking were driving the observed freezing patterns, we would expect to see similarly reduced fear discrimination across all three paradigms, as mice experience both contexts sequentially in each case. However, this effect appears to be specific to Paradigm 2, suggesting this may be due to other factors. We agree it would be informative to eliminate pre-conditioning exposure to both environments—to assess whether this improves fear discrimination and helps clarify the potential contribution of memory linking. This is something we plan to do in future studies that are beyond the scope of this initial paper on VR-CFC.

      (4) On lines 360-361, the authors conclude that extinction happens rapidly, within the first lap of the VR trial. To my understanding, that would mean that extinction would happen within the first 5-10 seconds of the test (according to Figure S1E). That seems far too fast for extinction to occur, as this never occurs in freely behaving mice this quickly.

      We agree with the reviewer that extinction in Paradigm 2 appears to occur relatively rapidly.

      However, the average time to complete the first lap in the fear-conditioned context in Paradigm 2 is 25.68 ± 5.55 seconds (as stated in line 384), indicating that extinction occurs within approximately the first 30 seconds of context exposure—not within 5–10 seconds. This is specific to Paradigm 2 and does not happen in either of the other paradigms, as shown in Supplementary Figure 4. For clarification, Figure S1E pertains to baseline running in Paradigm 1 and does not apply to Paradigm 2.

      As the reviewer points out, even at 30 seconds, extinction seems to be happening more quickly in Paradigm 2 than seen in freely moving setups. This may be due to a key structural difference in our setup. The VR-CFC task is organized into discrete trials, with mice being teleported back to the start after reaching the end of the virtual track. Completing a full lap without receiving a shock could serve as a clear signal that the threat is no longer present within the environment as the completion of a lap means that the animals have surveyed all locations within the environment. This structure could accelerate extinction compared to freely moving setups, where animals take longer to explore their complete environment due to the lack of discrete trials. Although this is true for all our paradigms, the accelerated extinction seen in paradigm 2 versus 1 and 3 may be driven by other factors. As noted by the reviewers, other task parameters—such as context pre-exposure timing, shock intensity, and conditioning duration— are likely to play a role in shaping extinction dynamics. These factors warrant further investigation, and we plan to explore them in future studies to better understand the conditions influencing extinction in the VR-CFC paradigm.

      (5) Throughout the different paradigms, the authors are using different shock intensities. This can lead to differences in fear memory encoding as well as in levels of fear memory generalization. I don't think that comparisons can be made across the different paradigms as too many variables (including shock intensity - 0.5/0.6mA can be very different from 1.0 mA) are different. How can the authors pinpoint which works best? Indeed, they find Paradigm 3 'works' better than Paradigm 2 because mice discriminate better between the neutral and shock contexts. This can definitely be driven by decreased generalization from using a 0.6mA shock in Paradigm 3 compared to 1.0 mA shock in Paradigm 2.

      The reviewer brings up important points here. We have now added new data evaluating 0.6 mA shocks in Paradigm 1 (Supplementary Figure 2A–E, n=8). These data show that 1.0 mA shocks produced stronger conditioned responses and greater fear discrimination compared to 0.6 mA. Our goal in Paradigm 3 was to begin with a lower shock intensity and assess whether additional modifications—specifically the shorter ISI and retention of the tail-coat during recall—could enhance fear conditioning. Surprisingly, despite the weaker shock intensity, Paradigm 3 resulted in improved discrimination and freezing behavior relative to Paradigm 2. We have now clarified this point in the manuscript (lines 466-470), and we interpret this outcome as evidence that the shorter ISIs and contextual cue continuity (tail-coat) likely play a more significant role in enhancing learning and recall. However, as noted in the text (lines 511-514), further testing is needed to determine the individual contributions of each parameter to successful VR-CFC. Fully optimizing the parameter settings will take additional time and resources, and we aim to continually refine the parameter space in the future, as has been done over the years for freely moving animals.

      (6) There are some differences in the calcium imaging dataset compared to other studies, and the authors should perform additional testing to determine why. This will be integral to validating their head-fixed paradigm(s) and showing they are useful for modeling circuit dynamics/behaviors observed in freely behaving mice. Moreover, the sample size (number of mice) seems low.

      The one notable difference between our imaging study and that done in freely moving animals is that we observed remapping of place cells in the control context. In contrast, Moita et al. (2004) reported more stable place fields in the control context. A key distinction is that their study included rewards in the control context, which may have contributed to the spatial stability. We now discuss this difference in the manuscript (lines 599-605).

      It should be noted that there are many key distinctions among paradigms that study neural activity during fear conditioning in freely moving animals. These include varying exposure times to environments (1–6 days), the time interval between neural activity recordings, and the use of food rewards during the experiment stages in freely moving animals to encourage exploration for place cell identification. Although freely moving paradigms that investigate fear conditioning and place cells are heterogeneous, we were encouraged by the replication of several key findings. This validates VR-based CFC as a viable tool for neural circuit investigations. While future work will include more thorough analyses, our current findings demonstrate the paradigm's effectiveness for modeling circuit dynamics and behavior. We have now expanded our dataset, which includes four additional mice, further corroborating these original findings.

      (7) It appears that the authors have already published a paper using Paradigm 3 (Ratigan et al 2023). If they already found a paradigm that is published and works, it is unclear to me what the current manuscript offers beyond that initial manuscript.

      The reviewer is correct that we have published a paper using Paradigm 3. However, this manuscript goes beyond that one and provides a much more comprehensive description and fundamental analysis of the behavior and experimental parameters regarding VR-CFC, allowing the research community to adapt our paradigm reproducibly. While Ratigan et al. (2023) offered only a minimal description of behavior and included just Paradigm 3, we present two additional paradigms along with neuronal validation using hippocampal place cells. We have now explicitly stated this in the introduction (lines 50-55).

      (8) As written, the manuscript is really difficult to follow with the averages and standard error reported throughout the text. This reporting in the text occurred heterogeneously throughout the text, as sometimes it was reported and other times it was not. Cleaning this reporting up throughout the paper would greatly improve the flow of the text and qualitative description of the results.

      We completely agree with this point and have now cleaned up the text, leaving details only in a few places we felt were important.

      Reviewer #3 (Public review):

      Summary:

      Krishnan et al. present a novel contextual fear conditioning (CFC) paradigm using a virtual reality (VR) apparatus to evaluate whether conditioned context-induced freezing can be elicited in head-fixed mice. By combining this approach with two-photon imaging, the authors aim to provide high-resolution insights into the neural mechanisms underlying learning, memory, and fear. Their experiments demonstrate that head-fixed mice can discriminate between threat and non-threat contexts, exhibit fear-related behavior in VR, and show context-dependent variability during extinction. Supplemental analyses further explore alternative behaviors and the influence of experimental parameters, while hippocampal neuron remapping is tracked throughout the experiments, showcasing the paradigm's potential for studying memory formation and extinction processes.

      Strengths:

      Methodological Innovation: The integration of a VR-based CFC paradigm with real-time twophoton imaging offers a powerful, high-resolution tool for investigating the neural circuits underlying fear, learning, and memory.

      Versatility and Utility: The paradigm provides a controlled and reproducible environment for studying contextual fear learning, addressing challenges associated with freely moving paradigms.

      Potential for Broader Applications: By demonstrating hippocampal neuron remapping during fear learning and extinction, the study highlights the paradigm's utility for exploring memory dynamics, providing a strong foundation for future studies in behavioral neuroscience.

      Comprehensive Data Presentation: The inclusion of supplemental figures and behavioral analyses (e.g., licking behaviors and variability in extinction) strengthens the manuscript by addressing additional dimensions of the experimental outcomes.

      Weaknesses:

      Characterization of Freezing Behavior: The evidence supporting freezing behavior as the primary defensive response in VR is unclear. Supplementary videos suggest the observed behaviors may include avoidance-like actions (e.g., backing away or stopping locomotion) rather than true freezing. Additional physiological measurements, such as EMG or heart rate, are necessary to substantiate the claim that freezing is elicited in the paradigm.

      To strengthen our claim that freezing is a conditioned response in this task, we have taken three key steps:

      (1) We adjusted our freezing detection threshold from 1 cm/s to near 0 cm/s to capture only periods where the animal is virtually motionless on the treadmill. We validated this approach in Figure 2, particularly in the zoomed-in track position trace in Figure 2A, which clearly shows that the identified freezing epochs correspond to no change in track position. All analyses and figures have been updated to reflect this more stringent threshold.

      (2) We have added a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I) where mice experienced the same protocol, including wearing a tail-coat, but received no shocks. These mice showed no increases in freezing behavior, which further demonstrates that the increased freezing we observe is a result of fear conditioning.

      (3) We have added a new supplementary video (Supplementary Video 2) that better illustrates the freezing behavior in our task.

      That said, we fully agree with the reviewer that freezing is not the only defensive response observed. Other behaviors—such as hesitation, backward movement, and slowing down—also emerge that are unique to our treadmill-based paradigm. We chose to focus on freezing in this manuscript to align with convention in freely moving fear conditioning studies and to facilitate direct comparisons. We agree that additional physiological measurements (e.g., EMG or heart rate) would provide further validation and could help distinguish between different forms of defensive responses. We view this as an important future direction and plan to incorporate such measures in upcoming studies. We highlight this in the results section (lines 175-179, 262-268) and in the discussion (lines 739-750).

      Analysis of Extinction: Extinction dynamics are only analyzed through between-group comparisons within each Recall day, without addressing within-group changes in behavior across days. Statistical comparisons within groups would provide a more robust demonstration of extinction processes.

      This is an important distinction and we have now added figures (Supplementary Figures 2H-I, 5C-D, 6C-D) showing within-VR behavior across Recall days, along with statistical comparisons and a description of the extinction process based on these results.

      Low Sample Sizes: Paradigm 1 includes conditions with very low sample sizes (N=1-3), limiting the reliability of statistical comparisons regarding the effects of shock number and intensity.

      Increasing sample sizes or excluding data from mice that do not match the conditions used in Paradigms 2 and 3 would improve the rigor of the analysis.

      While we included all conditions in Figure 2 for completeness, we have separated these conditions in Supplementary Figure 2 to ensure clarity. This allows researchers interested in this paradigm to see the approximate range of conditioned responses observed across different parameters. When comparing Paradigm 1 with Paradigms 2 and 3, we have only used data from 1mA, 6 shocks condition.

      Potential Confound of Water Reward: The authors critique the use of reward in conjunction with fear conditioning in prior studies but do not fully address the potential confound introduced by using water reward during the training phase in their own paradigm.

      We agree this is a point that needs discussion. We have now noted the limitation of using water rewards during training in the discussion section, particularly its effect on the animal’s motivation in the long term and on place cell activity (lines 814-820).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      I suggest changing "3 paradigms" to "3 versions of a CFC paradigm," as the paradigm is fundamentally the same, but parameters were adjusted towards finding an optimal protocol.

      We have changed this phrasing where applicable.

      Figure S2: There appear to be different sets of shock parameters for different mice, most with an n of 1 or 2. This is not reliable for making a decision for optimal shock parameters and should not be discussed in that way until a full-powered comparison is completed. Also, the N adds up to 19, yet only 18 are described as being included in the study.

      We thank the reviewer for this important point. We agree that the current study is not powered to definitively identify optimal parameter settings. We have been careful not to interpret it in that way in the text. Rather, we adopted a commonly used starting point from the freely moving literature—1 mA with six shocks—as our initial condition (lines 196-199). To provide context for others interested in pursuing this work, we have presented a range of conditioned responses from different parameter combinations to illustrate potential variability. In most cases, these data are intended for illustrative purposes only and are not meant to support firm conclusions. We agree that a systematic and fully powered investigation of each parameter would be highly valuable, and we plan to pursue this in future work (and hope other labs contribute to this goal, too), much like the iterative optimizations performed in freely moving paradigms over time.

      We thank the reviewer for catching the sample size discrepancy and have now corrected it.

      The number of animals for the no-shock condition should be included.

      Thank you. We have now included this.

      A possible explanation for the lower fear and poorer discrimination in versions 2 and 3 could be that 10 min pre-exposure to the CFC context on day -1 led to latent inhibition. Shorter (or eliminated) pre-exposure may improve outcomes.

      We agree that the exposure time is a parameter that we should explore. We have highlighted this in the discussion (lines 729-736) as a parameter that is worth testing in the future.

      For analysis of extinction, it is best to establish this within condition - is freezing to the CFC context significantly reduced compared with initial recall and similar to pre-training freezing? By using discrimination as your index of extinction, increases in control context freezing/inactivity can eliminate context discrimination without the conditioned response of freezing actually undergoing extinction.

      This is a good point, and we have now included analysis and conclusions based on a within-VR comparison for the analysis of fear extinction (Supplementary Figures 2H-I, 5C-D, 6C-D).

      Reviewer #3 (Recommendations for the authors):

      Clarification of Treadmill Shape: The manuscript describes the treadmill as "spherical" throughout. However, based on representative images and videos, the treadmill appears cylindrical. This discrepancy should be clarified to ensure consistency between the text and visuals.

      The reviewer is correct that the treadmill is cylindrical, and this was an error on our part. We have corrected it throughout.

      Figure and Legend Labeling: To improve clarity, all figures and their legends should be explicitly labeled with the corresponding paradigm (1, 2, or 3) to facilitate interpretation.

      We have now added a label on all figures that clarifies which Paradigm the figures are referring to. We have also explicitly added this to the figure legends.

      Objective Language: Subjective language, such as "since we wanted animals to" (Line 850), should be revised to reflect an objective tone (e.g., "to allow animals to"). Similarly, phrases like "We believe" (Line 896) should be avoided to maintain an unbiased presentation.

      We have removed subjective language from our text.

      Placement of Future Directions: Speculations on future experimental plans, such as the use of sex as a biological variable (Lines 895-903), should be included in the Discussion section rather than the Methods. Additionally, remarks about the responsiveness of female mice to tail shocks should be moved to the main text for proper contextualization.

      We have moved these lines as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthen the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail, and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      The authors have made some changes in the revised version. However, many of the changes were superficial, and some concerns still need to be addressed. Important details are still missing from the description of some experiments. Authors should carefully revise the manuscript to ascertain that all details that could affect interpretation of their results are presented clearly. For instance, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned, etc? These are important details, which need to be included. Even in cases where additional information was included, the authors did not discuss how the specific way in which certain experiments were performed could affect interpretation of their results. One example is the potential for compound carryover in their experiments. Another important one is the fact that CAPE affects bacterial growth and sporulation. Therefore, it is critical that authors acknowledge that they cannot discard the possibility that other factors besides compound interactions with the toxin are involved in their phenotypes. As stated previously, authors should also be careful when drawing conclusions from the analysis of microbiota composition data, and changes to the manuscript should be made to reflect this. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Again, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #2 (Public review):

      I appreciate the author's responses to my original review. This is a comprehensive analysis of CAPE on C. difficile activity. It seems like this compound affects all aspects of C. difficile, which could make it effective during infection but also make it difficult to understand the mechanism. Even considering the authors responses, I think it is critical for the authors to work on the conclusions regarding the infection model. There is some protection from disease by CAPE but some parameters are not substantially changed. For instance, weight loss is not significantly different in the C. difficile only group versus the C. difficile + CAPE group. Histology analysis still shows a substantial amount of pathology in the C. difficile + CAPE group. This should be discussed more thoroughly using precise language.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      Results are really good, and the CAPE shows a good and promising alternative for treating CDI.

      Weaknesses:

      Some references are too old or missing.

      Comments on revisions:

      I have read your study after comments made by all referees, and I noticed that all questions and suggestions addressed to the authors were answered and well explained. Some of the minor and major issues related to the article were also solved. I am satisfied with all the effort given by the authors to improve their manuscript.

      Thanks again for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The legend of Figure 3SB is incorrect. It should read "Growth curves of C. difficile BAA-1870 in the presence of varying concentrations of CAPE (0-64 µg/mL)". Also, there is something wrong with the symbols in this figure. I suspect what is happening is that the symbols for the concentrations of 32 and 64 µg/mL are superimposing, but this is a problem because the lower line looks like a closed circle, which is supposed to represent the condition where no CAPE was added. The authors should change the symbols to allow clear distinction between each of the conditions.

      Thanks for your constructive suggestion. We have modified the panel and figure legend in Figure 3SB. The concentrations of 32 μg/mL and 64 μg/mL are quite similar, which makes it challenging to differentiate between the corresponding data points on the graph. To enhance clarity, we have utilized distinct colors to help distinguish these closely valued lines as effectively as possible.

      Since the authors observed a significant effect of CAPE on both bacterial growth and spore production, their discussion and conclusions need to reflect the fact that the effects observed can no longer be attributed solely to toxin inhibition.

      Thanks for your comments. We have modified the corresponding description according to your suggestions.

      In lines 43-45, authors state that "CAPE treatment of C. difficile-challenged mice induces a remarkable increase in the diversity and composition of the gut microbiota (e.g., Bacteroides spp.)". It is still unclear to this reviewer why mention Bacteroides between parentheses. Does this mean that there was an increase in the abundance of Bacteroides? If that is the case this needs to be stated more clearly.

      Thanks for your comments. Treatment with CAPE indeed significantly increased the abundance of Bacteroides spp. in the gut microbiota (Figure 7H-J). However, to avoid ambiguity in the abstract, we have chosen to delete the specific mention of Bacteroides spp. within the parentheses.

      The modifications made to lines 132-135 still do not address my concern. Authors stated in the manuscript that "compounds that were not bound to TcdB were removed". But how was this done? This needs to be clearly explained in the manuscript. In the response to reviewers document, authors state that this was done through centrifugation. But given that the goal here is to separate excess of small molecule from a protein target, just stating that centrifugation was used is not enough. Did the authors use ultracentrifugation? What were the conditions employed. This is critical so that the reader can assess the degree of compound carryover that may have occurred. Also, authors need to clearly acknowledge the caveats of their experimental design by stating that they cannot rule out the contribution of compound carryover to their results.

      Thanks for your comments. We employed ultrafiltration centrifugal partition to remove the unbound small molecule compounds. Due to the large molecular weight of TcdB, approximately 270 kDa, we selected a 100 kDa molecular weight cutoff ultrafiltration membrane. The centrifugation was performed at 4000 g for 5 min to eliminate the compounds that did not bind to TcdB. We have incorporated the relevant methods and discussed the potential impacts on the respective sections of the manuscript.

      In line 142, authors added the molar concentration of caffeic acid, as requested. Although this helps, it is even more important that molar concentrations are added every time a compound concentration is mentioned. For instance, just 2 lines down there is another mention of a compound concentration. It would be informative if authors also added molar concentrations here and throughout the manuscript.

      Thanks for your comments. In our initial test design, we have utilized the concentration unit of μg/mL. However, during the conversion to μM using the dilution method, some values do not result in neat, whole numbers. For instance, the conversion of 32 μg/mL of caffeic acid phenyl ethyl ester yields 112.55 μM, which appears somewhat irregular when expressed in this manner.

      Line 277. For the sake of clarity, I would strongly suggest that authors use the term "control mice" instead of "model mice".

      Thanks for your comments. We have modified “model mice” to “control mice” throughout the manuscript.

      In line 302, the word taxa should not be capitalized. I capitalized it in my original comments simply to draw attention to it.

      Thanks for your comments. We have modified this word.

      In the section starting in line 318, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned? Etc, etc. These are important details, which need to be included.

      Thanks for your comments. We have added some metabolomics methods in the corresponding section.

      In line 338, the authors misunderstood my original comment. This sentence should read "...the final product of purine degradation, were markedly decreased in mice after...".

      Thanks for your comments. We have modified this sentence.

      Panels of figure 3 are still incorrectly labeled. The secondary structure predictions are shown in A and C, not A and B as is currently stated in the legend.

      Thanks for your comments. We have modified the figure legend in Figure 3.

      About Figure 5C, I think the authors for the clarification, but this explanation should be included in the figure legend.

      Thanks for your comments. We have added the relevant information to the figure legend.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Batra, Cabrera and Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) if it helps us to better understand the biology of gene expression or d) it helps us to understand epigenome editing activity. Problematically for point a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      We appreciate this point from Reviewer #1 and the instructive comments and helpful feedback on our study. We designed our approach keeping in mind that the primary use case is to understand how epigenome editing would affect gene expression.

      Other approaches have been published that use histone PTM to predict expression (e.g. PMID 27587684, 36588793). Is this model better in some way? No comparisons are made although a claim is made that direct comparisons are difficult. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. Approaches that predict expression levels are much more useful whereas some previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking. The paper does not seem to have substantial novel insights into understanding the biology of gene expression.

      We thank Reviewer #1 again for this insightful comment. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods performs classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read-depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons. We outline in the Discussion section that by creating a comprehensive dataset of epigenome editing outcomes, which include quantification of histone PTMs before and after in situ 1 perturbations, will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. In this revised version of our study, we have also added new data (Figure 3 – figure supplement 3) to further benchmark our model against others.

      The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel although only examined in the context of a p300 editor. As the author point out the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      We agree with the Reviewer and view these experiments as important components of future studies.

      Furthermore from the model evaluation of H3K9me3 is seems the model is performing modestly for other forms of epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517).

      This is an interesting point. We have included new data (Figure 4 – figure supplement 1), that quantifies how sensitive the trained gene expression model is to perturbations in H3K9me3. Indeed our data suggests that the model predictions are sensitive to perturbations in H3K9me3. For instance, there is a clear decrease and a gradual increase as the position where the perturbation is performed moves from upstream to downstream of the TSS. Additionally, the magnitude of the predicted fold-change is a function of how much the H3K9me3 is perturbed and hence the magnitude of change would be even higher if the perturbation magnitude is increased. However, this precise magnitude is hard to estimate In the absence of experimental perturbation data for H3K9me3. Leveraging our model in combination with KRAB-based CRISPRi is an exciting and important aspect of future studies.

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA independent off target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

      This remains an excellent point and indeed, we and others have observed that dCas9-p300 can result in off-target H3K27ac levels (both increased and suppressed) across the genome. Our study focused on p300, because the molecule is one of the few known proteins that can catalyze H3K27ac in the human genome, and H3K27ac remains a proxy for active genomic regulatory elements. Nevertheless, any off target activity of dCas9-p300 could certainly convolute our analyses. We have included language to address this caveat in our discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications, and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 13 gene promoters in two cell types, and measure gene expression changes to test their model.

      We remain appreciative of the constructive feedback and input from Reviewer #2 on our manuscript.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. They use dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations of the HEK293 data showed some support for the predictions after perturbation of H3K27ac.

      Weaknesses:

      The perturbation of 5 genes in K562 with perturb-seq data shows a modest correlation of ~0.5 and isn't included in the main figures. The authors are then left to speculate reasons why the outcome of epigenome editing doesn't fit their predictions, which highlights the limited value in the current version of this method.

      We agree with the reviewer’s suggestion and highlight in our conclusion that generating epigenome editing data across a variety of cell types and across many genes will help uncover the underlying mechanisms of gene expression modulation.

      As mentioned before, testing genes that were not expressed being most activated by dCas9-p300 weaken the correlations vs. looking at a broad range of different gene expression as the original model was trained on.

      We appreciate this comment from Reviewer #2. We note that the data generated from this dCas9-p300 perturb-seq experiment used gRNAs from a pre-existing library published previously (PMID: 37034704). While this library enabled deeper interrogation of dCas9-p300 driven effects compared to our previous revision, the gRNAs in this library were designed against genes associated with haploinsufficiency in neuronal cell types, and which were generally lowly-expressed in K562 cells. Further, we restricted our analysis here to promoter-proximal gRNAs (as opposed to enhancer-targeted gRNAs in the library), focusing our scope even more so. Thus the genes ultimately used for analysis are enriched for low expression.

      If the authors want this method to be used to predict outcomes of epigenome editing, expanding to dCas9-KRAB and other CRISPRa methods (SAM and VPR) would be useful. Those datasets are published and could be analyzed for this manuscript.

      This is an exciting suggestion from Reviewer #2. We agree, and view this as a component of future work in this area.

      The authors don't compare their method to other prediction methods.

      In this revised version of our study, we have also added new data (Figure 3 – figure supplement 3) to further benchmark our model against others. These data demonstrate that our CNN model outperforms existing approaches across multiple cell types.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Looking at the individual genes in K562 shows a random looking range of predictions and observed, with the exception of Bcl11A which is one of two genes in this set of 5 that are not expressed. I will repeat my earlier comment, that epigenome editing and CRISPRa methods generally show the most upregulation with the lowest expressed genes. I speculate that plotting endogenous expression vs. outcome (assuming using all gRNAs within a reasonable and similar distance to TSS) would produce a correlation of -0.5 or greater and be as useful as this method.

      We agree, and believe that this demonstrates more work is needed in this emerging research area.

      The methods describe Perturb-seq analysis but not the bench experiments.

      We have added the bench methods related to our Perturb-seq experiments to our revised manuscript under the Experimental Methods section in the Appendix.

      I don't understand why the authors can't compare to other methods as that is fairly standard in new prediction papers. I get that others used REMC vs. ENCODE, and were rank or binary based, but the authors could use REMC data and/or convert their data to ranked or binary and still compare. Lacking that it's hard to judge this manuscript.

      We have added benchmarking against existing methods as Figure 3 – figure supplement 3.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Our revised manuscript thoroughly addresses all comments and suggestions raised by the reviewers, as detailed in our point-by-point response. To strengthen our findings, we have conducted additional in vivo experiments to evaluate the presence of fibro-adipogenic progenitors (FAPs) at different time points during HO formation in control and BYL719-treated mice. Our results indicate that BYL719 reduces the accumulation of FAPs and promotes muscle fiber regeneration in vivo. We have also expanded our discussion on BYL719’s effects on mTOR signaling, further clarifying key points raised by Reviewer #1, and have addressed all minor comments.

      Additionally, in response to Reviewer #2, we have employed an orthogonal and complementary approach using a new model. We conducted chondrogenic differentiation experiments with murine MSCs expressing either ACVR1wt or ACVR1<sup>R206H</sup>. qPCR analysis of chondrogenic gene markers (Sox9, Acan, Col2a1) demonstrates that Activin A enhances their expression in ACVR1<sup>R206H</sup> cells, whereas BYL719 strongly suppresses their expression, regardless of ACVR1 mutational status. These new data further confirm that BYL719 effectively inhibits genes involved in ossification and osteoblast differentiation, independent of the ACVR1 mutation. We have also expanded our discussion to further clarify points raised by Reviewer #2 and have addressed all remaining minor comments.

      Below, we provide a detailed point-by-point response to the reviewers’ comments:

      Rreviewer #1:

      Point 1: In this revised manuscript, the authors clearly showed that BYL719 suppressed the proliferation and differentiation of murine myoblasts, C2C12 cells, in addition to human MSCs in vitro. Furthermore, BYL719 decreased migratory activity in vitro in monocytes and macrophages without suppressing proliferation. Overall, these data suggested that BYL719 is not a specific chemical compound for cell types or signaling pathways as mentioned in the manuscript by the authors themselves. Therefore, it was still unclear how to explain the molecular mechanisms in inhibition of HO by the compound in a specific signaling pathway in a specific cell type, MSCs, contradicting many other possibilities. The authors should add logical explanations in the manuscript.

      Regarding its selectivity, BYL719 is a potent and highly selective inhibitor of PI3Kα. It has been demonstrated in multiple studies and in several in vitro kinase assay panels (Furet et al. PMID: 23726034, Fritsch et al. PMID: 24608574). The IC50 or Kd values for BYL719 against PI3Kα were at least 50 times lower than for most of other kinases tested. Moreover, BYL719 is also highly selective for PI3Kα (IC50 = 4.6 nmol/L) compared to other class I PI3K (PI3Kβ (IC50 = 1,156 nmol/L), PI3Kδ (IC50 = 290 nmol/L), PI3Kγ (IC50 = 250 nmol/L)) (Fritsch et al). Consistent with these data, we show that, at the concentrations tested, BYL719 does not have a direct effect on any kinase receptor within the TGF-b superfamily, including ACVR1 or ACVR1<sup>R206H</sup>.

      Rather than blocking ACVR1 kinase activity, in our manuscript we provide evidence that BYL719 has the potential to inhibit osteochondroprogenitor specification and prevent an exacerbated inflammatory response in vivo (Valer et al., 2019a PMID: 31373426, and this manuscript) through different mechanisms, such as (i) increasing SMAD1/5 degradation, (ii) reducing transcriptional responsiveness to BMPs and Activin, (iii) blocking non-canonical ACVR1 responses such as the activation of AKT/mTOR. All these defined molecular mechanisms contribute to suppress HO in vitro and in vivo, as we report and explain throughout the manuscript. Selective PI3Kα inhibition is at the core of the different molecular pathways described. As such, PI3Kα blockade inhibits the phosphorylation of GSK3 and compromises SMAD1 protein stability, thereby altering canonical responsiveness and osteochondroprogenitor specification (Gamez et al PMID: 26896753; Valer et al PMID: 31373426). Moreover, PI3Kα blockade downregulates Akt/mTOR signalling, which is critical for FOP and non‐genetic (trauma induced) HO in preclinical models (Hino et al, 2017 PMID: 28758906; Hino et al. PMID: 30392977). Finally, PI3Kα inhibition hampers a number of proinflammatory pathways, thereby limiting the expression of pro-inflammatory cytokines, reducing the proliferation of monocytes, macrophages and mast cells, and partially blocking the migration of monocytes. As we suggest in the discussion of the manuscript, this effect likely causes a poor recruitment of monocytes and macrophages at injury sites and throughout the in vivo ossification process.

      Noteworthy, in our manuscript we do not refer to a “specific chemical compound for cell types”. Rather, in the Discussion we write “the administration of BYL719 prevented an exacerbated inflammatory response in vivo, possibly due to specific effects observed on immune cell populations.” This sentence did not intend to imply that BYL719 only affects these specific cell types, but aimed to emphasize the effects observed on those cell populations, even though systemic BYL719 may affect all populations. We rephrased it to “the administration of BYL719 prevented an exacerbated inflammatory response in vivo, possibly due to the effects observed on immune cell populations.” to provide a clearer message as suggested by the reviewer. We thank the reviewer for these questions and hope that these explanations and changes in the text improve the clarity of the message.

      Mesenchymal stem/stromal cells (MSCs) are osteochondroprogenitor cells that can follow distinct differentiation paths. In this study, we use these cells as an in vitro model for the study of osteochondrogenitor specification. MSCs, and induced MSCs (iMSCs), have been widely used as in vitro cellular models of osteochondroprogenitor specification for the analysis of markers, signaling, modulation, and differentiation potential or capacity. Their use as models for this purpose has been extensively studied in wild type MSCs, and in the presence of FOP mutations (Boeuf and Richter PMID: 20959030; Schwartzl et al. PMID: 37923731).

      Point 2: Related to comment #1, the effects of BYL719 on the proliferation and differentiation of fibro-adipogenic cells in skeletal muscle, which are potential progenitor cells of HO, should be important to support the claim of the authors.

      We have performed additional in vivo experiments to assess the presence of fibro-adipogenic precursors (FAPs) at different time-points during HO formation in control and BYL719-treated in the mouse model of heterotopic ossification. We analyzed the number of fibro-adipogenic progenitor (FAPs) during the progression of the HO. These data are shown in the new Figure3-Figure Supplement 1. We demonstrate that BYL719 reduces the number of PDGFRA+ cells (FAPs, red) throughout the ossification process in vivo. Moreover, now we also show an enlargement of the diameter of myofibers (labelled with wheat germ agglutinin, green) when animals were treated with BYL719, indicating improved muscle regeneration and further validating the data reported as supplementary figures that were added in the first revision of this manuscript.

      Point 3: BYL719 inhibited signaling through not only ACVR1-R206H and ACVR1-Q207D but also wild type ACVR1 and suppressed the chondrogenic differentiation of parental MSCs regardless of the expression of wild type or mutant ACVR1. Again, these findings suggest that BYL719 inhibits HO through a multiple and nonspecific pathway in multiple types of cells in vivo. The authors are encouraged to explain logically the use of bone marrow-derived MSCs to examine the effects of BYL719.

      As detailed in main point 1, we consider that the main target, molecular mechanisms and inhibited pathways by BYL719 are specific and well characterised in other research articles and further defined in this manuscript, including the generation of PI3Ka deficient mice in an FOP background, that undoubtedly demonstrates an essential role for PI3Ka in ACVR1-driven heterotopic ossification in vivo. Altogether, we are confident that BYL719 inhibits HO through multiple and specific pathways that arise from the PI3Kα inhibition. As a systemically administrated drug, BYL719 affects the multiple types of cells in vivo that express PI3Kα. It is well known that PI3Kα is exquisitely required for chondrogenesis and osteogenesis (Zuscik et al. PMID; Gamez et al PMID: 26896753 1824619). Accordingly, throughout the manuscript we refrain from suggesting a specific effect on ACVR1-R206H cells but instead an inhibitory effect on cell number and differentiation regardless on the ACVR1 form expressed.

      Similarly, as detailed in main point 1, MSCs and hiPSCs have been extensible used as in vitro cellular models of osteochondroprogenitor specification for the analysis of markers, signaling, modulation, and differentiation potential or capacity (Barruet et al., PMID: 28716551; Kan et al., PMID: 39308190).

      Point 4: BYL719 clearly inhibits an mTOR pathway. Is there a possibility that BYL719 suppresses HO by inhibiting mTOR rather than PI3K? The authors are encouraged to show the unique role of PI3K in BYL719-suppressed HO formation.

      As clarified above, BYL719 is a potent and selective inhibitor of PI3Kα, with minimal off-target inhibition against other kinases, as it has been demonstrated in multiple studies and in several in vitro kinase assay panels. In the same study, while IC50 of BYL719 against PI3Kα was (IC50 = 4.6 nmol/L), IC50 against mTOR was (IC50= >9,100 nmol/L), indicating that it was not directly inhibited. mTOR is one of the well-known pathways that are activated downstream of PI3K. Therefore, there is no surprise that blocking PI3Kα will block mTOR signalling. This potential effect was already demonstrated in previous publications (Valer et al., 2019a PMID: 31373426) and discussed throughout the first revision. We consider that the additive effect of mTOR inhibition and other molecular mechanisms downstream of PI3Kα, including reduced SMAD1/5 protein levels, contribute to the in vivo HO inhibition by BYL719.

      Reviewer #2:

      Point 1: It is also important to note that, in most of the data, there is no significant difference between cells with wild-type ACVR1 and those with the R206H mutation. The authors demonstrated that ACVR1 is not a target of BYL719 based on NanoBRET assay data, suggesting that BYL719's effect is not specific to FOP cells, even though they used an FOP mouse model to show in vivo effects.

      The main effect of R206H mutation is the gain of function in response to Activin A. For most of the responses to other ACVR1 ligands (e.g. BMP6/7), we observe a slightly increased response in the presence of the mutation (which is consistent with previous research, usually labelling RH as a “weak activating mutant” unless Activin A is added (Song et al., PMID: 20463014)). Therefore, as expected, most of the differences between WT and RH mutant cells can be observed mostly upon Activin A addition, as observed, for example, in Figure 3 of our manuscript.

      We agree with the reviewer that, at the concentrations used, BYL719 does not specifically target FOP cells. However, we believe that it targets downstream pathways of PI3Kα inhibition that are essential for osteochondrogenic specification, regardless of mutation status. This therapeutic strategy aligns with other experimental drugs, including Palovarotene (validated for FOP) and Garetosmab and Saracatinib (in advanced clinical trials), which target Activin A function, ACVR1 activity, or osteochondrogenic differentiation irrespective of the mutant allele. Unlike these molecules, BYL719 has been chronically administered to patients (including children) without major side effects (Gallagher et al.; PMID: 38297009), further supporting its potential for safe long-term use.

      The authors should consider that the effect of Activin A on R206H cells is not identical to that of BMP6 on WT cells. If the authors aim to identify the target of BYL719 in FOP cells, they should compare R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719.

      We use Activin A and BMP6, both high-affinity ACVR1 ligands, to demonstrate, as observed in figure 6, that PI3Kα inhibition can inhibit the expression of genes within GO terms ossification and osteoblast differentiation. It is important to note, however, that Activin A canonical signaling receptor is ACVR1B. Since BYL719 blocks the induction of a heterotopic ossification gene expression signature common to Activin A and BMP6, in the context of the FOP mutation R206H, our results indicate that BYL719 inhibition affects a signaling pathway downstream of ACVR1, activated by either BMP6 (wild type receptor, relevant for non-genetic heterotopic ossifications) or Activin (R206H mutant receptor, relevant for FOP).

      We consider that the comparison (RH ACTA BYL vs WT BMP6 BYL) would provide confounding results raised from intrinsic model differences in basal expression programs (WT vs RH), and differences in the quantitative level of signaling of the different ligands at these specific doses. First, if we only consider SMAD1/5 signaling, Activin A and BMP6 won’t have identical signaling, and differences will arise from the strength of that signaling. Secondly, in the suggested comparison we would find, mostly, all the differential gene expression promoted by Activin A canonical signaling through type I receptors ACVR1B/ALK4 in complex with ACVR2A or ACVR2B, promoting SMAD2/3 activation (in addition to the altered signaling that ACVR1-R206H could promote). Examples of differential response in pSMAD1/5 in ACVR1-WT or RH with BMP ligands and R206H with Activin A ligand, and examples of pSMAD2/3 canonical signaling in R206H cells have been described in Ramachandran et al, PMID: 34003511; Hatsell et al., PMID: 26333933).

      Point 2: The interpretation of the data in the new Figure 5 is inappropriate. Based on the expression levels of SOX9, COL2A1, and ACAN, it is unclear whether the effect of BYL719 is due to the inhibition of differentiation or proliferation. The addition of Activin A showed no difference between ACVR1/WT and ACVR1/R206H cells, suggesting that these cells did not accurately replicate the FOP condition.

      To gain consistency in our manuscript, we decided to use an orthogonal and complementary approach in a completely new model. We performed new experiments of chondrogenic differentiation using murine MSCs from UBC-Cre-ERT2/ACVR1<sup>R206H</sup> knock-in mice. These cells, when treated with 4OH-tamoxifen, express the intracellular exons of human ACVR1<sup>R206H</sup> in the murine Acvr1 locus. Therefore, we can compare differentiation of wild type and R206H MSCs isolated form the same mice. We initiated the chondrogenic differentiation assay from confluent cells to minimize changes in cell proliferation throughout the process. These new results are shown in the new Figure 5F. Mutant (RH) cells display an enhanced chondrogenic response to activin A compared to wild type cells. The treatment with BYL719 decreased the expression of chondrogenic markers irrespective of the mutational status of ACVR1 in the cells, further supporting our previous results in this manuscript and published article (Valer et al., 2019a PMID: 31373426).

      Point 3: The additional investigation of RNA-seq data provided useful information but was insufficient to fully address the purpose of this study. The authors should identify downregulated genes by comparing WT cells treated with Activin A/BYL719 and Activin A alone and then compare these identified genes with those shown in Figure 5E. Additionally, they should compare R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719. These comparisons will clarify whether there are FOP-specific BYL719-regulated genes.

      We thank the reviewer for considering that RNAseq data provides useful information. As already discussed in our answer above, our results indicate that regardless of the ligand (Activin A or BMP6) and regardless of the ACVR1 mutation (WT, relevant for non-genetic heterotopic ossifications or RH, relevant for FOP), BYL719 can inhibit the expression of the genes relevant to endochondral ossification. In our opinion, this is a very relevant conclusion of this study.

      We have deeply considered the strategy proposed by the reviewer, comparing “WT cells treated with Activin A/BYL719 and Activin A alone and then compare these identified genes with those shown in Figure 5E” and/or comparing “R206H cells treated with Activin A/BYL719 to WT cells treated with BMP6/BYL719”. While we have discussed why we do not consider appropriate the first comparison proposed, there are a number of reasons why we are not confident that the second comparison would provide a straightforward conclusion.

      Regarding the second suggested comparison already in Main point 1, we consider that it would provide confounding results due to all the arguments detailed in Main point 1. Regarding the first suggested comparison, we also consider that it would provide confounding results. There are several reasons why we do not consider that the genes only found in the RH comparison can be confidently considered genes that are only affected by BYL719 in RH cells.

      First, the effect of BYL719 in an osteogenic-prone sample (for example, RH-ActA) is higher than the effect that we can observe in absence of this activation (for example, WT-ActA), as observed in the higher number of significantly downregulated genes in RH ActA BYL vs RH ActA comparison, compared to WT ActA BYL vs WT ActA. Similar results are observed in figure 3C, where the expressions of the genes are significantly inhibited in RH ActA compared to RH ActA BYL. This inhibition is not significantly observed in in WT ActA compared to WT ActA BYL because the osteogenic expression of these genes is already very weak in the absence of ACVR1 R206H. This weak signaling of pSMAD1/5 in the absence of osteogenic signaling (RH without ligand or, especially, WT with Activin A) has already been described (Ramachandran et al. MID: 34003511). Therefore, even though the inhibition is present in both comparisons, as observed in figure 6C, the extent of the observed effect is different. Second, we are comparing a different number of DEGs for each comparison between them. If we compare the 67 downregulated genes from one comparison and 38 downregulated genes from the other comparison, the unequal list size may inflate the number of unique genes in the group with more downregulated genes. To prove these concerns, we performed the comparison that the reviewer suggested and we found, for example, that amongst the 38 differentially downregulated ossification genes in (WT_ActA_BYL vs WT_ActA) and 67 differentially downregulated ossification genes in (RH_ActA_BYL vs RH_ActA), 39 genes were only found in the RH comparison, while 10 were only found in the WT comparison, and 28 were found in both.

      These effects are present, for example, when studying the ID genes, well-known downstream mediators of BMP signaling. In this case, ID1 is downregulated in both comparisons, while ID2, ID3, and ID4, are downregulated only in the RH-group, despite the fact that all ID1, ID2, ID3, and ID4 are similarly regulated and increase their expression with similar time curves upon BMP signaling activation (Yang et al., PMID: 23771884). Therefore, we consider that the comparisons proposed will not help us to identify specific BYL719-regulated genes relevant for FOP and/or ACVR1 R206H signaling. Again, we consider that BYL719 effect is not specific of FOP cells. Our results show that regardless of the ligand (Activin A or BMP6) and regardless of the ACVR1 mutation (WT, relevant for non-genetic heterotopic ossifications or RH, relevant for FOP), BYL719 can inhibit the expression of the genes linked to ossification and osteoblast differentiation, which could be important for the treatment of FOP and non-genetic heterotopic ossifications.

      Point 4: The data in Figure 7 are not relevant to the aim of this study because the cell lines used in these experiments did not have ACVR1/R206H mutations. The authors mentioned that BMP6 is a ligand for ACVR1 and, therefore, these experiments reflect the situation of inflammatory cells in FOP. This is inappropriate and not rational. As mentioned above, the effect of Activin A on FOP cells is not identical to the effect of BMP6 in wild-type cells. The data in Figure 7 indicated that the effect of BYL719 is unrelated to the presence of BMP6, clearly demonstrating that these experiments are not related to the activation of ACVR1. In the gene expression analyses, almost all genes showed no changes with the addition of BMP6. Only TGF and CCL2 showed upregulation in THP1 cells, and the treatment with BYL719 failed to inhibit the effect of BMP6, suggesting that these experiments merely demonstrate the effect of BYL719 on inflammatory cells irrespective of the presence of the HO signal.

      We consider that Figure 7 is relevant to the aim of this study. As shown in Fig. 8, treatment of FOP mice with BYL719 led to a decreased recruitment of immune cells within the FOP lesions, suggesting a direct effect of BYL719 in immune cells. This is very relevant for the FOP pathology, since flare-ups have been linked with inflammatory episodes since the very early characterization of the disease (Mejias-Rivera et al., PMID: 38672135). Given the technical difficulties to transduce THP1, RAW264 and HMC1 cell lines with lentiviral particles carrying ACVR1 R206H, we decided to partially recapitulate ACVR1 R206H activation with recombinant BMP6 and to test the effect of BYL719 in these conditions. In these models, we found that BYL719 inhibited the expression of key genes driving immune cell activation, in a cell-type and ligand independent manner. To clarify this rationale, we have swapped Figures 7 and 8 and adjusted our conclusions accordingly. We have softened our interpretations, emphasizing the absence of the ACVR1 R206H mutant receptor in these experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review)

      Summary

      The results offer compelling evidence that L5-L5 tLTD depends on presynaptic NMDARs, a concept that has previously been somewhat controversial. It documents the novel finding that presynaptic NMDARs facilitate tLTD through their metabotropic signaling mechanism.

      We thank Reviewer 1 for their kind words and thoughtful feedback!

      Strengths

      The experimental design is clever and clean. The approach of comparing the results in cell pairs where NMDA is deleted either presynaptically or postsynaptically is technically insightful and yields decisive data. The MK801 experiments are also compelling.

      We are very grateful for this kind feedback!

      Weaknesses

      No major weaknesses were noted by this reviewer.

      We were happy to see that Reviewer 1 had no concerns in the Public Review. We address their Recommendations here below.

      Reviewer #1 (Recommendations for the authors):

      There is one minor issue that the authors might want to address. In Figure 6C, the average time course of the controls (blue symbols) shows a clear decline in the baseline. The rate of this decline appears to be similar to the initial decline rate observed after inducing tLTD.

      Sorry, the x-axis was truncated so the first data points were not visible. We fixed Fig 6C as well as 6G, which suffered from the same problem.

      Reviewer 2 (Public review)

      Summary

      The study characterized the dependence of spike-timing-dependent long-term depression (tLTD) on presynaptic NMDA receptors and the intracellular cascade after NMDAR activation possibly involved in the observed decrease in glutamate probability release at L5-L5 synapses of the visual cortex in mouse brain slices.

      We are grateful for Reviewer 2’s thoughtful and detailed feedback!

      Strengths

      The genetic and electrophysiological experiments are thorough. The experiments are well-reported and mainly support the conclusions. This study confirms and extends current knowledge by elucidating additional plasticity mechanisms at cortical synapses, complementing existing literature.

      We were thrilled to see that the reviewer thinks our experiments are “thorough”, “well-reported” and they “mainly support the conclusions”!

      Weaknesses

      While one of the main conclusions (preNMDARs mediating presynaptic LTD) is resolved in a very convincing genetic approach, the second main conclusion of the manuscript (non-ionotropic preNMDARs) relies on the use of a high concentration of extracellular blockers (MK801, 2 mM; 7-clorokinurenic acid: 100 microM), but no controls for the specific actions of these compounds are shown.

      We thank the reviewer for calling our genetic approach “very convincing”!

      Regarding the pharmacological controls: for MK-801, we deliberately used a high extracellular concentration in the mM-range to match the intracellular concentrations used both in our own experiments and in prior studies (Berretta and Jones, 1996; Brasier and Feldman, 2008; Buchanan et al., 2012; Corlew et al., 2007; Humeau et al., 2003; Larsen et al., 2011; Rodríguez-Moreno et al., 2011; Rodríguez-Moreno and Paulsen, 2008). Our goal was to isolate the variable of application site (internal vs. external) while keeping concentration constant. If we had used the lower, more conventional µM-range extracellular concentrations (e.g., Huettner and Bean, 1988; Kemp et al., 1988; Tovar and Westbrook, 1999), differences in outcome might have reflected differences in drug efficacy rather than localization — particularly since failure to observe an effect at low concentrations would be hard to interpret.

      We now clarify this rationale in the revised manuscript (lines 578-585).

      As for 7-chlorokynurenic acid (7-CK), the 100 µM concentration we used is standard for effectively blocking the glycine-binding site of NMDARs (e.g., Nabavi et al., 2013).

      We also added two supplementary figures to show the effects of washing in MK-801 and 7-CK. In MK-801, responses are stable at low frequency (clarified in the manuscript lines 155-157 and Supp Fig 1 caption text). However, 7-CK suppresses responses appreciably, which takes time to stabilize. We clarify in the revised manuscript that in 7-CK experiments, we waited for this stabilization before inducing tLTD (lines 167-172 and Supp Fig 2 caption text). This additional suppression is consistent with 7-CK also acting as a potent competitive inhibitor of L-glutamate transport into synaptic vesicles (Bartlett et al., 1998).

      In addition, no direct testing for ions passing through preNMDAR has been performed.

      Sorry for being unclear, we have previously tested directly for ions passing through preNMDARs. For example, we showed blockade with Mg<sup>2+</sup> before (Abrahamsson et al., 2017; Wong et al., 2024), and we showed preNMDAR Ca<sup>2+</sup> supralinearities before (Abrahamsson et al., 2017; Buchanan et al., 2012). To improve the manuscript, we clarified the text accordingly (lines 140-141).

      It is not known if the results can be extrapolated to adult brain as the data were obtained from 11-18 days-old mice slices, a period during which synapses are still maturing and the cortex is highly plastic.

      Thank you, this is a good point. We address this point in the revised manuscript (lines 428-432). While our study focuses on the early postnatal period (P11–P18), when plasticity mechanisms are prominent and synaptic maturation is ongoing, we agree that extrapolation to the adult brain should be made with caution.

      Reviewer #2 (Recommendations for the authors):

      Points 1-3 were also found in the Public Review so are not addressed again here.

      (4) Results seem to be obtained in the absence of inhibition blocking and the role of inhibition in tLTD is not described. It should be indicated whether present results are obtained with or without the functional inhibitory synapse activation. If GABAergic synapses are not blocked authors need to show what happens when this inhibition is blocked.

      We agree that extracellular stimulation can inadvertently recruit inhibitory circuits. However, in our paired whole-cell recordings, synaptic responses are always subthreshold and exclusively reflect the direct connection between the two recorded neurons (Chou et al., 2024; Song et al., 2005). Under these conditions, inhibitory synapses are not activated, and we therefore did not apply GABAergic blockers. We thank the reviewer for raising this, which is now clarified in the Methods (lines 539-541) of the revised manuscript.

      (5) In some figures, the number of experiments seems to be low, and this number of experiments might be increased (Figures 1C, 3C, 4B).

      We acknowledge that the number of experiments in these figures is modest, but these recordings are technically demanding, and the data are carefully curated. Importantly, the observed effects were statistically significant, indicating that the sample sizes were sufficient. We also note that concerns about statistical power are typically more critical in the case of negative or null results, whereas our findings were positive.

      (6) The discussion is detailed but it is not clear that the activation of JNK2 needs to be achieved by a non-ionotropic action of NMDAR as activation after ionotropic NMDAR activation has been described in the literature. This point needs to be clarified and expanded.

      Sorry that we were unclear on this point. We clarified this on lines 371-372 of the manuscript.

      (7) Adding a cartoon/schematic summarizing the proposed mechanism for tLTD would help the reading of the manuscript.

      We appreciate this suggestion and agree that a schematic would be helpful. However, we prefer to hold off on including one at this stage, as aspects of the underlying mechanism — particularly the role of CB1 receptors in presynaptic pyramidal cells (Sjöström et al., 2003) — are currently under active investigation in a separate project. To avoid potentially misleading oversimplifications, we would prefer to revisit a summary schematic once these uncertainties have been resolved.

      Minor:

      (1) Concentration of compounds is recommended to be included in the figures or in the text. This would make it easy to follow the results.

      We appreciate the suggestion. However, we avoid repeating concentrations to emphasize that conditions are consistent unless otherwise stated. All compound concentrations are clearly listed in the Methods and remain unchanged across experiments. We believe this streamlined approach avoids redundancy while keeping the results clear.

      (2) In some figures, failures in synaptic transmission can be observed (and changes after tLTD). The authors may analyse changes in a number of failures in synaptic transmission after tLTD as an additional indication of a presynaptic expression of this form of tLTD. PPR may also be included in all figures.

      While failures in synaptic transmission are occasionally visible, we chose to focus on CV analysis, which is mathematically equivalent to failure rate analysis, as both rely on the same underlying variability in synaptic responses (Brock et al., 2020). Provided failures are reliably extracted (which requires sufficient signal-to-noise), CV and failure rate analyses should yield consistent conclusions.

      In contrast, PPR analysis is not mathematically equivalent to CV analysis and may offer complementary insights into presynaptic mechanisms. However, the presence of preNMDARs complicates the use of paired-pulse stimulation during baseline: preNMDARs enhance release during high-frequency activity (Abrahamsson et al., 2017; Sjöström et al., 2003; Wong et al., 2024), so repeated stimulation can suppress synaptic responses when preNMDARs are blocked, potentially confounding interpretation. For this reason, we limited PPR analysis to Figures 5 and 6, where conditions were appropriate.

      Admittedly, our manuscript was previously not clear on when we did paired-pulse stimulation and when we did not. We have clarified this in the revised manuscript (lines 548- 551 and lines 569-574).

      (3) Discussion: Line 363-64, hippocampal (SC-CA1 synapses) results exist where postsynaptic MK801 blocks presynaptic tLTD, this may be added here and in the references.

      While we acknowledge that postsynaptic MK-801 has been shown to block presynaptic tLTD at hippocampal SC–CA1 synapses, we note that the hippocampus is part of the archicortex, whereas our study focuses on neocortical circuits, as highlighted in the manuscript title. Given the substantial anatomical and functional differences between these regions, we prefer to keep our discussion focused on the neocortex to maintain conceptual coherence.

      (4) Discussion: While authors indicate "non-ionotropic" they do not discuss whether this action can be named properly "metabotropic" and whether G-proteins may be in fact needed for this action. The authors may briefly discuss this point.

      We previously referred to non-ionotropic NMDAR signaling as “metabotropic,” but reconsidered after discussions with colleagues, including Juan Lerma, who pointed out that the term typically implies G-protein coupling, which has not been definitively shown in this context. While the term “metabotropic” is used inconsistently in the literature (Heuss and Gerber, 2000; Heuss et al., 1999) — sometimes broadly to indicate non-ion flow signaling — we prefer to avoid potential confusion and therefore use “non-ionotropic” unless and until G-protein involvement is clearly demonstrated. We clarified this on lines 423-427 of the Discussion.

      (5) Page 19, line 451 NMDR needs to be corrected to NMDAR.

      Thanks! This was corrected.

      Reviewer 3 (Public review)

      Summary

      In this manuscript, "Neocortical Layer-5 tLTD Relies on Non-Ionotropic Presynaptic NMDA Receptor Signaling", Thomazeau et al. seek to determine the role of presynaptic NMDA receptors and the mechanism by which they mediate expression of frequency-independent timing-dependent long-term depression (tLTD) between layer-5 (L5) pyramidal cells (PCs) in the developing mouse visual cortex. By utilizing sophisticated methods, including sparse Cre-dependent deletion of GluN1 subunit via neonatal iCre-encoding viral injection, in vitro quadruple patch clamp recordings, and pharmacological interventions, the authors elegantly show that L5 PC->PC tLTD is (1) dependent on presynaptic NMDA receptors, (2) mediated by non-ionotropic NMDA receptor signaling, and (3) is reliant on JNK2/Syntaxin-1a (STX1a) interaction (but not RIM1αβ) in the presynaptic neuron. The study elegantly and pointedly addresses a long-standing conundrum regarding the lack of frequency dependence of tLTD.

      We thank the reviewer for calling our methods “sophisticated” and our study “elegant”! We appreciate the kind feedback!

      Strengths

      The authors did a commendable job presenting a very polished piece of work with high-quality data that this Reviewer feels enthusiastic about. The manuscript has several notable strengths. Firstly, the methodological approach used in the study is highly sophisticated and technically challenging and successfully produced high-quality data that were easily accessible to a broader audience. Secondly, the pharmacological interventions used in the study targeted specific players and their mechanistic roles, unveiling the mechanism in question step-by-step. Lastly, the manuscript is written in a well-organized manner that is easy to follow. Overall, the study provides a series of compelling evidence that leads to a clear illustration of mechanistic understanding.

      We are elated that the reviewer described our study with words such as “polished”, “high-quality”, “sophisticated”, and “compelling”!

      Minor comments

      (1) For the broad readership, a brief description of JNK2-mediated signaling cascade underlying tLTD, including its intersection with CB1 receptor signaling may be desired.

      Thank you, this is a great suggestion for improving clarity. We briefly address this point in the revised manuscript (lines 360-363).

      (2) The authors used juvenile mice, P11 to P18 of age. It is a typical age range used for plasticity experiments, but it is also true that this age range spans before and after eye-opening in mice (~P13) and is a few days before the onset of the classical critical period for ocular dominance plasticity in the visual cortex. Given the mechanistic novelty reported in the study, can authors comment on whether this signaling pathway may be age-dependent?

      Thanks, Reviewer 2 also raised this point. In the revised manuscript, we discuss this point (lines 428-432).

      Reviewer #3 (Recommendations for the authors):

      (1) Minor typos: page 4 line 101: sensitivity -> sensitive.

      We fixed this typo.

      (2) Page 15 line 333: sensitivity -> sensitive.

      We fixed this typo.

      (3) Minor aesthetic suggestion: On the scale bars for all examples, LTP and LTD data are easily confused with the letter L. I'd suggest flipping them left to right.

      We thank the reviewer for the suggestion. We flipped the scale bars in all figures.

      References

      Abrahamsson, T., Chou, C.Y.C., Li, S.Y., Mancino, A., Costa, R.P., Brock, J.A., Nuro, E., Buchanan, K.A., Elgar, D., Blackman, A.V., et al. 2017. Differential Regulation of Evoked and Spontaneous Release by Presynaptic NMDA Receptors. Neuron 96: 839-855 e835

      Bartlett, R.D., Esslinger, C.S., Thompson, C.M., and Bridges, R.J. 1998. Substituted quinolines as inhibitors of L-glutamate transport into synaptic vesicles. Neuropharmacology 37: 839-846

      Berretta, N., and Jones, R.S. 1996. Tonic facilitation of glutamate release by presynaptic N-methyl-D-aspartate autoreceptors in the entorhinal cortex. Neuroscience 75: 339-344.

      Brasier, D.J., and Feldman, D.E. 2008. Synapse-specific expression of functional presynaptic NMDA receptors in rat somatosensory cortex. J Neurosci 28: 2199-2211

      Brock, J.A., Thomazeau, A., Watanabe, A., Li, S.S.Y., and Sjöström, P.J. 2020. A Practical Guide to Using CV Analysis for Determining the Locus of Synaptic Plasticity. Frontiers in Synaptic Neuroscience 12:11 10.3389/fnsyn.2020.00011

      Buchanan, K.A., Blackman, A.V., Moreau, A.W., Elgar, D., Costa, R.P., Lalanne, T., Tudor Jones, A.A., Oyrer, J., and Sjöström, P.J. 2012. Target-Specific Expression of Presynaptic NMDA Receptors in Neocortical Microcircuits. Neuron 75: 451-466

      Chou, C.Y.C., Wong, H.H.W., Guo, C., Boukoulou, K.E., Huang, C., Jannat, J., Klimenko, T., Li, V.Y., Liang, T.A., Wu, V.C., and Sjöström, P.J. 2024. Principles of visual cortex excitatory microcircuit organization. The Innovation 6: 1-11

      Corlew, R., Wang, Y., Ghermazien, H., Erisir, A., and Philpot, B.D. 2007. Developmental switch in the contribution of presynaptic and postsynaptic NMDA receptors to long-term depression. J Neurosci 27: 9835-9845

      Heuss, C., and Gerber, U. 2000. G-protein-independent signaling by G-protein-coupled receptors. Trends in Neurosciences 23: 469-475

      Heuss, C., Scanziani, M., Gähwiler, B.H., and Gerber, U. 1999. G-protein-independent signaling mediated by metabotropic glutamate receptors. Nature Neuroscience 2: 1070-1077

      Huettner, J.E., and Bean, B.P. 1988. Block of N-methyl-D-aspartate-activated current by the anticonvulsant MK-801: selective binding to open channels. PNAS 85: 1307-1311.

      Humeau, Y., Shaban, H., Bissière, S., and Lüthi, A. 2003. Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain. Nature 426: 841-845

      Kemp, J.A., Foster, A.C., Leeson, P.D., Priestley, T., Tridgett, R., Iversen, L.L., and Woodruff, G.N. 1988. 7-Chlorokynurenic acid is a selective antagonist at the glycine modulatory site of the N-methyl-D-aspartate receptor complex. PNAS 85: 6547-6550

      Larsen, R.S., Corlew, R.J., Henson, M.A., Roberts, A.C., Mishina, M., Watanabe, M., Lipton, S.A., Nakanishi, N., Perez-Otano, I., Weinberg, R.J., and Philpot, B.D. 2011. NR3A-containing NMDARs promote neurotransmitter release and spike timing-dependent plasticity. Nat Neurosci 14: 338-344

      Nabavi, S., Kessels, H.W., Alfonso, S., Aow, J., Fox, R., and Malinow, R. 2013. Metabotropic NMDA receptor function is required for NMDA receptor-dependent long-term depression. PNAS 110: 4027-4032

      Rodríguez-Moreno, A., Kohl, M.M., Reeve, J.E., Eaton, T.R., Collins, H.A., Anderson, H.L., and Paulsen, O. 2011. Presynaptic induction and expression of timing-dependent long-term depression demonstrated by compartment-specific photorelease of a use-dependent NMDA receptor antagonist. J Neurosci 31: 8564-8569

      Rodríguez-Moreno, A., and Paulsen, O. 2008. Spike timing-dependent long-term depression requires presynaptic NMDA receptors. Nat Neurosci 11: 744-745

      Sjöström, P.J., Turrigiano, G.G., and Nelson, S.B. 2003. Neocortical LTD via coincident activation of presynaptic NMDA and cannabinoid receptors. Neuron 39: 641-654

      Song, S., Sjöström, P.J., Reigl, M., Nelson, S., and Chklovskii, D.B. 2005. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS biology 3: e68

      Tovar, K.R., and Westbrook, G.L. 1999. The incorporation of NMDA receptors with a distinct subunit composition at nascent hippocampal synapses in vitro. J Neurosci 19: 4180-4188

      Wong, H.H., Watt, A.J., and Sjöström, P.J. 2024. Synapse-specific burst coding sustained by local axonal translation. Neuron 112: 264-276 e266

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      (1) “It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.”

      We agree with the reviewer that metabolic changes may differ ex vivo versus in vivo. We now state: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (2) “The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.”

      We have clarified that the metabolic changes may be in RPCs or in other retinal cell types on lines 149-152: “Since these measurements were performed in bulk, and the ratio of RPCs to differentiated cells declines as development proceeds, it is not clear whether glycolytic activity is temporally regulated within RPCs or in other retinal cell types.”

      However, since we mined a single cell (sc) RNA-seq dataset, we are able to attribute gene expression specifically within RPCs (Figure 1).

      (3) “The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.”

      We have added the information and references brought up by the reviewer in our discussion (lines 529-549 and 570-574). We have also suggested future experiments to further analyse our system in line with the studies now referenced (lines 580-589).

      (4) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      We have expanded the list of glycolytic genes analysed, in modified Figure 1B, and expanded the description of these results on lines 156-166.

      (5) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We added a comment to this effect to the discussion: “It is possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation, which we could assess in the future.“ (lines 600-603).

      (6) “Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).”

      We have added the acetate data to main Figure 7E.

      We added a supplemental data table that was inadvertently not included in our last submission. Figure 2– Data supplement 1.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Assuming that increased glycolysis gets RPCs to exit from the proliferative stage earlier, the total number of retinal cells, notably that of the rod photoreceptors, should be reduced since the pool of proliferating cells is depleted earlier. Is that really the case for a mature retina? To address this question, the authors should perform quantifications of photoreceptors at a stage where most developmental cell death has concluded (i.e. at P14 or later; Young, J. Comp. Neurol. 229:362-373, 1984) and check whether or not there are more or less photoreceptors present.

      We have previously quantified numbers of each cell type in Pten RPC-cKO retinas, and as suggested by the reviewer, there are fewer rod photoreceptors at P7 (Tachibana et al. 2016. J Neurosci 36 (36) 9454-9471) and P21 (Hanna et al. 2025. IOVS. Mar 3;66(3):45). We have edited the following sentence: “Using cellular birthdating, we previously showed that Pten-cKO RPCs are hyperproliferative and differentiate on an accelerated schedule between E12.5 and E18.5, yet fewer rod photoreceptors are ultimately present in P7 (Tachibana et al., 2016) and P21 (Hanna et al., 2025) retinas, suggestive of a developmental defect. (lines 184-187).

      (2) Figure 1B, 1H: On what data are these two figures based? The plots suggest that a high-density time series of gene expression and rod photoreceptor birth was performed, yet it is not clear where and how this was done. The authors should provide the data, plot individual data points, and, if applicable perform a statistical analysis to support their idea that glycolytic gene expression (as a surrogate for glycolysis) overlaps in time with rod photoreceptor birth (Figure 1B) and that in Pten KO the glycolytic gene expression is shifted forward in time (Figure 1H). If the data required to construct these plots (min. 5 data points, min 3 repeats each) does not exist or cannot be generated (e.g. from reanalysis of previously published datasets), then these graphs should be removed.

      We have removed the previous Figure 1B and Figure 1H.

      (3) Figure 2E: Which PKM isozyme was analyzed here? Does the genetic analysis allow us to distinguish between PKM1 and PKM2? Since PKM governs the key rate-limiting step of glycolysis but was not significantly upregulated, does this not contradict the authors' main hypothesis? If PKM at some point was inhibited (see also below comment to Figure 5) one would expect an accumulation of glycolytic intermediates, including phosphoenolpyruvate. Was such an effect observed?

      The data in Figure 2E is bulk RNA-seq data. Since there is only a single Pkm gene that is alternatively spliced, the RNA-sequencing data cannot distinguish between the four PK isozymes that arise from alternative splicing. Specifically, we used Illumina NextSeq 500 for sequencing of 75bp Single-End reads that will sequence transcripts for alternatively spliced Pkm1 and Pkm2 mRNAs, which carry a common 3’end. We added a statement to this effect: “However, since we employed 75 bp single-end sequencing, we could not distinguish between alternatively spliced Pkm1 and Pkm2 mRNAs.“ (lines 215-216).

      We have not performed metabolic analyses of glycolytic intermediates, but we have proposed such a strategy as an important avenue of investigation for future studies in the Discussion: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (4) Figure 3 and materials & methods: For the retinal explant cultures, was the RPE included in the cultured explants? If so, how can the authors distinguish drug effects on neuroretina and RPE? If the RPE was not included, then the authors should discuss how the missing RPE - neuroretina interaction could have influenced their results.

      We remove the RPE from the retinal explants, as indicated in the Methods section. The RPE is a metabolic hub that allows transport of nutrients for the retina, so in the absence of the RPE, there is not an immediate source of energy, such as glucose, to the retina. However, the media (DMEM) contains 25 mM glucose to replace the RPE as an energy source, and we now show that RPCs express GLUT1, which allows uptake of glucose (see new Figure 3A).

      We added the following sentence “P0 explants were mounted on Nucleopore membranes and cultured on top of retinal explant media, providing a source of nutrients, growth factors and glucose. “(lines 241-243).

      (5) Figure 3: It seems rather odd that, if glycolysis was so important for retinal proliferation, differentiation, and metabolism in general, the inhibition of glycolysis with 2DG should not produce a strong degeneration. However, since 2DG competes with glucose, and must be used at nearly equimolar concentration to block glycolysis in a meaningful way, it is possible that the 2DG concentration used simply was not high enough to substantially inhibit glycolysis. Since the inhibitory effect of 2DG depends on the glucose concentration, the authors should measure and provide the concentration of glucose in the explant culture medium. This value should be given either in results or materials and methods.

      We recently published a manuscript showing that 2DG treatments at the same concentrations employed in this study are effective at reducing lactate production in the developing retina in vivo, which is the expected effect of reduced glycolysis (Hanna et al. 2025. IOVS). However, in this study, we did not observe an impact on cell survival.

      We do not agree that it is necessary to measure glucose in the media since the anti-proliferative effect of 2DG is well known, and we are working in the effective range established by multiple groups. We have clarified that we are in the effective range by adding the following sentences: “2DG is typically used in the range of 5-10 mM in cell culture studies and in general, has anti-proliferative effects. To test whether 2DG treatment was in the effective range, explants were exposed to BrdU, which is incorporated into S-phase cells, for 30 minutes prior to harvesting. 2DG treatment resulted in a dose-dependent inhibition of RPC proliferation as evidenced by a reduction in BrdU<sup>+</sup> cells (Figure 3D), indicating that our treatment was in the effective range.” (lines 246-251).

      (6) Figure 3F: The authors use immunostaining for cleaved, activated caspase-3 to assess the amount of apoptotic cell death. However, there are many different possible mechanisms for neuronal cells to die, the majority of which are caspase-independent. To assess the amount of cell death occurring, the authors should perform a TUNEL assay (which labels apoptotic and non-apoptotic forms of cell death; Grasl-Kraupp et al., Hepatology 21:1465-8, 1995), quantify the numbers of TUNEL-positive cells in the retina, and compare this to the numbers of cells positive for activated caspase-3.

      We agree with the reviewer that there are more ways for a cell to die than just apoptosis, and TUNEL would pick up dying cells that may undergo apoptosis or necrosis, for example, our data with cleaved caspase-3, an executioner protease for apoptosis, provides us with clear evidence of cell death in our different conditions. Since this manuscript is not focused on cell death pathways, we have not performed the additional TUNEL assay.

      (7) Figure 4F and 4I: At post-natal day P7 the rod outer segments (OSs) only just start to grow out and the characteristic, rhodopsin-filled disk stacks are not yet formed. To test whether the PFKB3 gain-of function or the Pten KO has a marked effect on OS formation and length, the authors should perform the same tests on older, more mature retina at a time when rod OS show their characteristic disk structures (e.g. somewhere between P14 to P30). The same applies to the 2DG inhibition on the Pten KO retina.

      The precocious differentiation of rod outer segments observed in P7 Pten-cKO retinas does not persist in adulthood, and instead reflects a developmental acceleration. Indeed, we found that in Pten cKO retinas at 3-, 6- and 12-months of age, rod and cone photoreceptors degenerate, and cone outer segments are shorter (Hanna et al., 2025; Tachibana et al., 2016). These data demonstrate that Pten is required to support rod and cone survival.

      (8) Figure 5: Lowering media pH is a rather coarse and untargeted intervention that will have multiple metabolic consequences independent of PKM2. It is thus hardly possible to attribute the effects of pH manipulation to any specific enzyme. To assess this and possibly confirm the results obtained with low pH, the authors should perform a targeted inhibition experiment, for instance using Shikonin (Chen et al., Oncogene 30:4297-306, 2011), to selectively inhibit PKM2. If the retinal explant cultures contained the RPE, an additional question would be how the changes in RPE would alter lactate flux and metabolization between RPE and neuroretina (see also question 4 above).

      We have reframed the rationale for the pH manipulation experiments, highlighting the importance of pH in cell fate specification, and indicating that the aggregation of PKM2 is only one possible effect of lower pH.

      We wrote: “Given that altered glycolysis influences intracellular pH, which in turn controls cell fate decisions, we set out to assess the impact of manipulating pH on cell fate selection in the retina. One of the expected impacts of lowering pH was the aggregation of PKM2, a rate-limiting enzyme for glycolysis, which aggregates in reversible, inactive amyloids (Cereghetti et al., 2024).” (lines 362-366). 

      We have also added a discussion point “Whether pH manipulations also impact the stability of other retinal proteins, such as PKM2, can be further investigated in the future using specific PKM2 inhibitors, such as Shikonin (Chen et al., 2011). (lines 545-547).

      (9) Figure 5G: As for Figure 3F, the authors should perform TUNEL assays to assess the number of cells dying independent of caspase-3.

      Please see response to point 6.

      (10) Figure 7E: In the figure legend "K" should read "E". From the figure and the legend, it is not clear to which cell type this diagram should refer. This must be specified. Importantly, the insulin-dependent glucose-transporter 4 (GLUT4) highlighted in Figure 7E, while expressed on inner retinal vasculature endothelial cells, is not expressed in retinal neurons. What GLUTs exactly are expressed in what retinal neurons may still be to some extent contentious (cf. Chen et al., elife, https://doi.org/10.7554/eLife.91141.3 ; and reviewer comments therein), yet RPE cells clearly express GLUT1, photoreceptors likely express GLUT3, Müller glia cells may express GLUT1, while horizontal cells likely express GLUT2 (Yang et al., J Neurochem. 160:283-296, 2022).’

      We have removed this summary schematic for simplicity.

      (11) Materials and methods: The retinal explant culture system must be described in more detail. Important questions concern the use of medium and serum for which the providers, order numbers, and batch/lot numbers (whichever is applicable) must be given. The glucose concentration in the medium (including the serum content) should be measured. A key concern is whether the explants were cultivated submerged into the medium - this would prevent sufficient oxygenation and drive metabolism towards glycolysis (i.e. the Pasteur effect) - or whether they were cultivated on top of the liquid medium, at the interface between air and liquid (i.e. a situation that would favor OXPHOS).

      We have added further detail to the methods section for the explant assay (lines 686-689). We cultured the retinal explants on membranes on top of the media, which is the standard methodology in the field and in our laboratory (Cantrup et al., 2012; Tachibana et al., 2016; Touahri et al., 2024). Typically, RPCs undergo aerobic glycolysis, meaning that even in the presence of oxygen, they still prefer glycolysis rather than OXPHOS. We demonstrated that 2DG blocks RPC proliferation when treated with 2DG, indicating that RPCs are indeed favoring glycolysis in our assay system.

      (12) A point the authors may want to discuss additionally is the potential relevance of their data for the pathogenesis of human diseases, especially early developmental defects such as they occur in oxygen-induced retinopathy of prematurity.

      We would like to thank the reviewer for their valuable comment. Given that retinopathy of prematurity (ROP) is primarily vascular in nature, and we have not investigated vascular defects in this study, we have elected not to add a discussion of ROP to our manuscript.

      Minor points

      (1) Please add a label indicating the ages of the retina to images showing the entire retina (i.e. "P7"; e.g. in Figures 1F, 3, 4D, 5, etc.).

      Figure 1:

      1D: E18.5 indicated at the bottom of the two panels

      1F – P0 is indicated at the bottom of the two panels.

      Figure 3C-H: P0 explant stage and days of culture indicated

      Figure 4D: E12.5 BrdU and P7 harvest date indicated

      Figure 5C-H: P0 explant stage and days of culture indicated

      Figure 7A-E: P0 explant stage and days of culture indicated

      (2) The term Ctnnb1 should be introduced also in the abstract.

      We now state that Ctnnb1 encodes for b-catenin in the abstract.

      (3) Line 249: "...remaining..." should probably read "...remained...".

      Changed (now line 260).

      (4) Line 381: The sentence "...correlating with the propensity of some RPCs to continue to proliferate while others to differentiate.", should probably be rewritten to something like "...correlating with the propensity of some RPCs to continue to proliferate while others differentiate.".

      We have corrected this sentence.

      (5) The structure of the discussion might benefit from the introduction of subheadings.

      We have introduced subheadings.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1H shows the kinetics of rod photoreceptor production as accelerated, but does not represent the fact that fewer rods are ultimately produced, which appears to be the case from the data. If so, the Pten cKO curve should probably be lower than WT to reflect that difference.

      We have removed this graph (as per Reviewer #2, point 2).

      (2) KEGG analysis also showed that the HIF-1 signaling pathway is altered in the Pten cKO retina. What is the significance of that, and is it related to metabolic dysregulation? It has been shown that lactate can promote vessel growth, which initiates at birth in the mouse retina.

      We have added some information on HIF-1 to the Discussion. “The increased glycolytic gene expression in Pten-cKO retinas is likely tied to the increased expression of hypoxia-induced-factor-1-alpha (Hif1a), a known target of mTOR signaling that transcriptionally activates Slc1a3 (GLUT1) and glycolytic genes (Hanna et al., 2022). Indeed, mTOR signaling is hyperactive in Pten-cKO retinas (Cantrup et al., 2012; Tachibana et al., 2016; Tachibana et al., 2018; Touahri et al., 2024), and likewise, in Tsc1-cKO retinas, which also increase glycolysis via HIF-1A (Lim et al., 2021).” (lines 489-494).

      Cantrup, R., Dixit, R., Palmesino, E., Bonfield, S., Shaker, T., Tachibana, N., Zinyk, D., Dalesman, S., Yamakawa, K., Stell, W. K., Wong, R. O., Reese, B. E., Kania, A., Sauve, Y., & Schuurmans, C. (2012). Cell-type specific roles for PTEN in establishing a functional retinal architecture. PLoS One, 7(3), e32795. https://doi.org/10.1371/journal.pone.0032795

      Cereghetti, G., Kissling, V. M., Koch, L. M., Arm, A., Schmidt, C. C., Thüringer, Y., Zamboni, N., Afanasyev, P., Linsenmeier, M., Eichmann, C., Kroschwald, S., Zhou, J., Cao, Y., Pfizenmaier, D. M., Wiegand, T., Cadalbert, R., Gupta, G., Boehringer, D., Knowles, T. P. J., Mezzenga, R., Arosio, P., Riek, R., & Peter, M. (2024). An evolutionarily conserved mechanism controls reversible amyloids of pyruvate kinase via pH-sensing regions. Dev Cell. https://doi.org/10.1016/j.devcel.2024.04.018

      Chen, J., Xie, J., Jiang, Z., Wang, B., Wang, Y., & Hu, X. (2011). Shikonin and its analogs inhibit cancer cell glycolysis by targeting tumor pyruvate kinase-M2. Oncogene, 30(42), 4297-4306. https://doi.org/10.1038/onc.2011.137

      Hanna, J., Touahri, Y., Pak, A., David, L. A., van Oosten, E., Dixit, R., Vecchio, L. M., Mehta, D. N., Minamisono, R., Aubert, I., & Schuurmans, C. (2025). Pten Loss Triggers Progressive Photoreceptor Degeneration in an mTORC1-Independent Manner. Invest Ophthalmol Vis Sci, 66(3), 45. https://doi.org/10.1167/iovs.66.3.45

      Tachibana, N., Cantrup, R., Dixit, R., Touahri, Y., Kaushik, G., Zinyk, D., Daftarian, N., Biernaskie, J., McFarlane, S., & Schuurmans, C. (2016). Pten Regulates Retinal Amacrine Cell Number by Modulating Akt, Tgfbeta, and Erk Signaling. J Neurosci, 36(36), 9454-9471. https://doi.org/10.1523/JNEUROSCI.0936-16.2016

      Touahri, Y., Hanna, J., Tachibana, N., Okawa, S., Liu, H., David, L. A., Olender, T., Vasan, L., Pak, A., Mehta, D. N., Chinchalongporn, V., Balakrishnan, A., Cantrup, R., Dixit, R., Mattar, P., Saleh, F., Ilnytskyy, Y., Murshed, M., Mains, P. E., Kovalchuk, I., Lefebvre, J. L., Leong, H. S., Cayouette, M., Wang, C., Sol, A. D., Brand, M., Reese, B. E., & Schuurmans, C. (2024). Pten regulates endocytic trafficking of cell adhesion and Wnt signaling molecules to pattern the retina. Cell Rep, 43(4), 114005. https://doi.org/10.1016/j.celrep.2024.114005

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA.

      Strengths:

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange.

      Weaknesses:

      The authors need more careful text writing. Without page and line numbers, it is hard to give comments.

      We would like to thank the reviewer for their kind words of appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      Homologous recombination (HR) is a critical pathway for repairing double-strand DNA breaks and ensuring genomic stability. At the core of HR is the RAD51-mediated strand-exchange process, in which the RAD51-ssDNA filament binds to homologous double-stranded DNA (dsDNA) to form a characteristic D-loop structure. While decades of biochemical, genetic, and single-molecule studies have elucidated many aspects of this mechanism, the atomic-level details of the strand-exchange process remained unresolved due to a lack of atomic-resolution structure of RAD51 D-loop complex.

      In this study, the authors achieved this by reconstituting a RAD51 mini-filament, allowing them to solve the RAD51 D-loop complex at 2.64 Å resolution using a single particle approach. The atomic resolution structure reveals how specific residues of RAD51 facilitate the strand exchange reaction. Ultimately, this work provides unprecedented structural insight into the eukaryotic HR process and deepens the understanding of RAD51 function at the atomic level, advancing the broader knowledge of DNA repair mechanisms.

      Strengths:

      The authors overcame the challenge of RAD51's helical symmetry by designing a minifilament system suitable for single-particle cryo-EM, enabling them to resolve the RAD51 D-loop structure at 2.64 Å without imposed symmetry. This high resolution revealed precise roles of key residues, including F279 in Loop 2, which facilitates strand separation, and basic residues on site II that capture the displaced strand. Their findings were supported by mutagenesis, strand exchange assays, and single-molecule analysis, providing strong validation of the structural insights.

      Weaknesses:

      Despite the detailed structural data, some structure-based mutagenesis data interpretation lacks clarity. Additionally, the proposed 3′-to-5′ polarity of strand exchange relies on assumptions from static structural features, such as stronger binding of the 5′-arm-which are not directly supported by other experiments. This makes the directional model compelling but contradicts several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600).

      Overall:

      The 2.6 Å resolution cryoEM structure of the RAD51 D-loop complex provides remarkably detailed insights into the residues involved in D-loop formation. The high-quality cryoEM density enables precise placement of each nucleotide, which is essential for interpreting the molecular interactions between RAD51 and DNA. Particularly, the structural analysis highlights specific roles for key domains, such as the N-terminal domain (NTD), in engaging the donor DNA duplex.

      This structural interpretation is further substantiated by single-molecule fluorescence experiments using the KK39,40AA NTD mutant. The data clearly show a significant reduction in D-loop formation by the mutant compared to wild-type, supporting the proposed functional role of the NTD observed in the cryoEM model.

      However, the strand exchange activity interpretation presented in Figure 5B could benefit from a more rigorous experimental design. The current assay measures an increase in fluorescence intensity, which depends heavily on the formation of RAD51-ssDNA filaments. As shown in Figure S6A, several mutants exhibit reduced ability to form such filaments, which could confound the interpretation of strand exchange efficiency. To address this, the assay should either: (1) normalize for equivalent levels of RAD51-ssDNA filaments across samples, or (2) compare the initial rates of fluorescence increase (i.e., the slope of the reaction curve), rather than endpoint fluorescence, to better isolate the strand exchange activity itself.

      Based on the structural features of the D-loop, the authors propose that strand pairing and exchange initiate at the 3'-end of the complementary strand in the donor DNA and proceed with a 3'-to-5' polarity. This conclusion, drawn from static structural observations, contrasts with several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). While the structural model is compelling and methodologically robust, this discrepancy underscores the need for further experiments.

      We would like to thank the reviewer for highlighting the importance of our findings to our understanding of the mechanism of homologous recombination.

      We agree with the reviewer that the reduced filament-forming ability of some of the RAD51 mutants complicates a straightforward interpretation of their strand-exchange assay. Interestingly, the RAD51 mutants that appear most impaired are the esDNA-capture mutants that do not contact the ssDNA in the structure of the pre-synaptic filament. However, the RAD51 NTD mutants, that display the most severe defect in strand-exchange, have a near-WT filament forming ability.

      The reviewer correctly points out that the polarity of strand exchange by RecA and RAD51 is an extensively researched topic that has been characterised in several authoritative studies. In our paper, we simply describe the mechanistic insights obtained from the structural D-loop models of RAD51 (our work) and RecA (Yang et al, PMID: 33057191).The structures illustrate a very similar mechanism of D-loop formation that proceeds with opposite polarity of strand exchange for RAD51 and RecA. Comparison of the D-loop structures for RecA and RAD51 provides an attractive explanation for the opposite polarity, as caused by the different positions of their dsDNA-binding domains in the filament structure. We agree with the reviewer that further investigation will be needed for an adequate rationalisation of the available evidence. We will mention the relevant literature in the revised version of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filaments during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in the HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with the D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51-mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA.

      Strengths:

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed, and interpreted.  These results provide novel insights into RAD51's function in HR.

      (2) The DNA substrate used was well designed, taking into consideration the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-particle cryo-EM.

      (3) The authors utilised their previous expertise in capping DNA ends using monomeric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at the D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing is also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F, and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity of csDNA during RAD51-mediated strand exchange.

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, single-molecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5).

      Weaknesses:

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models.

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway.

      The significance of the work for the DNA repair field and beyond:

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homologous strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery of biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with the D-loop and provides new strategies for targeting RAD51 to improve cancer therapies.

      We thank the reviewer for their positive comments on the significance of our work. Concerning the proposed polarity of strand exchange based on our structural finding, please see our reply to the previous reviewer; we agree with the reviewer that further experimentation will be needed to reach a settled view on this.

      Testing the functional effects of the RAD51 mutants on HR in cells was not an aim of the current work but we agree that it would be a very interesting experiment, which would likely provide further important insights into the mechanism of strand exchange at the core of the HR reaction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) The initial high accumulation by all cells followed by the emergence of a sub-population that has reduced its intracellular levels of tachyplesin is a key observation and I agree with the authors' conclusion that this suggests an induced response to the AMP is important in facilitating the bimodal distribution. However, I think the conclusion that upregulated efflux is driving the reduction in signal in the "low accumulator" subpopulation is not fully supported. Steady-state amounts of intracellular fluorescent AMP are determined by the relative rates of influx and efflux and a decrease could be caused by decreasing influx (while efflux remained unchanged), increasing efflux (while influx remained unchanged), or both decreasing influx and increasing efflux. Given the transcriptomic data suggest possible changes in the expression of enzymes that could affect outer membrane permeability and outer membrane vesicle formation as well as efflux, it seems very possible that changes to both influx and efflux are important. The "efflux inhibitors" shown to block the formation of the low accumulator subpopulation have highly pleiotropic or incompletely characterised mechanisms of action so they also do not exclusively support a hypothesis of increased efflux.

      We agree with the reviewer that the emergence of low accumulators after 30 min in the presence of extracellular tachyplesin-NBD (Figure 4A) could be due to either decreased influx while efflux remained unchanged, increased efflux while influx remained unchanged, or both decreasing influx and increasing efflux. Increased proteolytic activity or increased secretion of OMVs could also play a role.

      We have now acknowledged that “Reduced intracellular accumulation of tachyplesin-NBD in the presence of extracellular tachyplesin-NBD could be due to decreased drug influx, increased drug efflux, increased proteolytic activity or increased secretion of OMVs.” (lines 313-315).

      However, the emergence of low accumulators after 60 min in the absence of extracellular tachyplesin-NBD in our efflux assays (Figure 4C) cannot be due to decreased influx while efflux remained unchanged because of the absence of extracellular tachyplesin-NBD. We acknowledge that in our original manuscript we did not explicitly state that the efflux assays reported in Figure 4C-D were performed in the absence of tachyplesin-NBD in the extracellular environment. We have now clarified this point in our manuscript, we have added illustrations in Figure 4A, 4C-D and we have also carried out efflux assays using ethidium bromide (EtBr) to further support our conclusions about the primary role played by efflux in reducing tachyplesin accumulation in low accumulators. We have added the following paragraphs to our revised manuscript:

      “Next, we performed efflux assays using ethidium bromide (EtBr) by adapting a previously described protocol [62]. Briefly, we preloaded stationary phase E. coli with EtBr by incubating cells at a concentration of 254 µM EtBr in M9 medium for 90 min. Cells were then pelleted and resuspended in M9 to remove extracellular EtBr. Single-cell EtBr fluorescence was measured at regular time points in the absence of extracellular EtBr using flow cytometry. This analysis revealed a progressive homogeneous decrease of EtBr fluorescence due to efflux from all cells within the stationary phase E. coli population (Figure S13A). In contrast, when we performed efflux assays by preloading cells with tachyplesin-NBD (46 μg mL<sup>-1</sup> or 18.2 μM), followed by pelleting and resuspension in M9 to remove extracellular tachyplesin-NBD, we observed a heterogeneous decrease in tachyplesin-NBD fluorescence in the absence of extracellular tachyplesin-NBD: a subpopulation retained high tachyplesin-NBD fluorescence, i.e. high accumulators; whereas another subpopulation displayed decreased tachyplesin-NBD fluorescence, 60 min after the removal of extracellular tachyplesin-NBD (Figure 4B). Since these assays were performed in the absence of extracellular tachyplesin-NBD, decreased tachyplesin-NBD fluorescence could not be ascribed to decreased drug influx or increased secretion of OMVs in low accumulators, but could be due to either enhanced efflux or proteolytic activity in low accumulators.

      Next, we repeated efflux assays using EtBr in the presence of 46 μg mL<sup>-1</sup> (or 20.3 µM) extracellular tachyplesin-1. We observed a heterogeneous decrease of EtBr fluorescence with a subpopulation retaining high EtBr fluorescence (i.e. high tachyplesin accumulators) and another population displaying reduced EtBr fluorescence (i.e. low tachyplesin accumulators, Figure S14B) when extracellular tachyplesin-1 was present. Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].

      Taken together, our data demonstrate that in the absence of extracellular tachyplesin, stationary phase E. coli homogeneously efflux EtBr, whereas only low accumulators are capable of performing efflux of intracellular tachyplesin after initial tachyplesin accumulation. In the presence of extracellular tachyplesin, only low accumulators can perform efflux of both intracellular tachyplesin and intracellular EtBr. However, it is also conceivable that besides enhanced efflux, low accumulators employ proteolytic activity, OMV secretion, and variations to their bacterial membrane to hinder further uptake and intracellular accumulation of tachyplesin in the presence of extracellular tachyplesin.”

      These amendments can be found on lines 316-350 and in the new Figure S13 and Figure 4. We have also carried out more tachyplesin-NBD accumulation assays using single and double gene-deletion mutants lacking efflux components, please see Response 3 to reviewer 2 and the data reported in Figure 4B.

      (2) A conclusion of the transcriptomic analysis is that the lower accumulating subpopulation was exhibiting "a less translationally and metabolically active state" based on less upregulation of a cluster of genes including those involved in transcription and translation. This conclusion seems to borrow from well-described relationships referred to as bacterial growth laws in which the expression of genes involved in ribosome production and translation is directly related to the bacterial growth (and metabolic) rate. However, the assumptions that allow the formulation of the bacterial growth laws (balanced, steady state, exponential growth) do not hold in growth arrest. A non-growing cell could express no genes at all or could express ribosomal genes at a very low level, or efflux pumps at a high level. The distribution of transcripts among the functional classes of genes does not reveal anything about metabolic rates within the context of growth arrest - it only allows insight into metabolic rates when the constraint of exponential growth can be assumed. Efflux pumps can be highly metabolically costly; for example, Tn-Seq experiments have repeatedly shown that mutants for efflux pump gene transcriptional repressors have strong fitness disadvantages in energy-limited conditions. There are no data presented here to disprove a hypothesis that the low accumulators have high metabolic rates but allocate all of their metabolic resources to fortifying their outer membranes and upregulating efflux. This could be an important distinction for understanding the vulnerabilities of this subpopulation. Metabolic rates can be more directly estimated for single cells using respiratory dyes or pulsed metabolic labelling, for example, and these data could allow deeper insight into the metabolic rates of the two subpopulations. My main recommendation for additional experiments to strengthen the conclusions of the paper would be to attempt to directly measure metabolic or translational activity in the high- and low-accumulating populations. I do not think that the transcriptomic data are sufficient to draw conclusions about this but it would be interesting to directly measure activity. Otherwise, it might be reasonable to simply soften the language describing the two populations as having different activity levels. They do seem to have different transcriptional profiles, and this is already an interesting observation.

      We agree with the reviewer that it might be misleading to draw conclusions on bacterial metabolic states solely based on transcriptomic data. We have therefore removed the statement “low accumulators displayed a less translationally and metabolically active state”. We have instead stated the following: “Our transcriptomics analysis showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression processes compared to high accumulators”. Moreover, we have employed the membrane-permeable redox-sensitive dye C<sub>12</sub>-resazurin, which is reduced to the fluorescent C<sub>12</sub>-resorufin in metabolically active cells, to obtain a more direct estimate of the metabolic state of low and high accumulators of tachyplesin. We have added the following paragraph reporting our new data:

      “Our transcriptomics analysis also showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression compared to high accumulators. To gain further insight on the metabolic state of low tachyplesin accumulators, we employed the membrane-permeable redox-sensitive dye, resazurin, which is reduced to the highly fluorescent resorufin in metabolically active cells. We first treated stationary phase E. coli with 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD for 60 min, then washed the cells, and then incubated them in 1 μM resazurin for 15 min and measured single-cell fluorescence of resorufin and tachyplesin-NBD simultaneously via flow cytometry. We found that low tachyplesin-NBD accumulators also displayed low fluorescence of resorufin, whereas high tachyplesin-NBD accumulators also displayed high fluorescence of resorufin (Figure S16), suggesting lower metabolic activity in low tachyplesin-NBD accumulators.”

      These amendments can be found on lines 398-408 and in Figure S16.

      (3) The observation that adding nutrients to the stationary phase cultures pushes most of the cells to the "high accumulator" state is presented as support of the hypothesis that the high accumulator state is a higher metabolism/higher translational activity state. However, it is important to note that adding nutrients will cause most or all of the cells in the population to start to grow, thus re-entering the familiar regime in which bacterial growth laws apply. This is evident in the slightly larger cell sizes seen in the nutrient-amended condition. In contrast to stationary phase cells, growing cells largely do not exhibit the bimodal distribution, and they are much more sensitive to tachyplesin, as demonstrated clearly in the supplement. Growing cells are not necessarily the same as the high-accumulating subpopulation of non-growing cells.

      Following the reviewer’s suggestion, we are no longer using the nutrient supplementation data to support the hypothesis that high accumulators possess higher metabolism or translational activity.

      The nutrient supplementation data is now only used to investigate whether tachyplesin-NBD accumulation and efficacy can be increased, and not to show that high tachyplesin-NBD accumulators are more metabolically or translationally active.

      Furthermore, our previous statement “Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enhanced survival to antibiotic treatment.” has now been removed from the discussion.

      (4) It might also be worth adding some additional context around the potential to employ efflux inhibitors as therapeutics. It is very clear that obtaining sufficient antimicrobial drug accumulation within Gram-negative bacteria is a substantial barrier to effective treatments, and large concerted efforts to find and develop therapeutic efflux pump inhibitors have been undertaken repeatedly over the last 25 years. Sufficiently selective inhibitors of bacterial efflux pumps with appropriate drug-like properties have been challenging to find and none have entered clinical trials. Multiple psychoactive drugs have been shown to impact efflux in bacteria but usually using concentrations in the 10-100 uM range (as here). Meanwhile, the Ki values for their human targets are usually in the sub- to low-nanomolar range. The authors rightly note that the concentration of sertraline they have used is higher than that achieved in patients, but this is by many orders of magnitude, and it might be worth expanding a bit on the substantial challenge of finding efflux inhibitors that would be specific and non-toxic enough to be used therapeutically. Many advances in structural biology, molecular dynamics, and medicinal chemistry may make the quest for therapeutic efflux inhibitors more fruitful than it has been in the past but it is likely to remain a substantial challenge.

      We agree with this comment and we have now added the following statement:

      “This limitation underscores the broader challenge of identifying EPIs that are both effective and minimally toxic within clinically achievable concentrations, while also meeting key therapeutic criteria such as broad-spectrum efficacy against diverse efflux pumps, high specificity for bacterial targets, and non-inducers of AMR [117]. However, advances in biochemical, computational, and structural methodologies hold the potential to guide rational drug design, making the search for effective EPIs more promising [118]. Therefore, more investigation should be carried out to further optimise the use of sertraline or other EPIs in combination with tachyplesin and other AMPs.”

      This amendment can be found on lines 535-542.

      (5) My second recommendation is that the transcriptomic data should be made available in full and in a format that is easier for other researchers to explore. The raw data should also be uploaded to a sequence repository, such as the NCBI Geo database or the EMBL ENA. The most useful format for sharing transcriptomic data is a table (such as an excel spreadsheet) of transcripts per million counts for each gene for each sample. This allows other researchers to do their own analyses and compare expression levels to observations from other datasets. When only fold change data are supplied, data cannot be compared to other datasets at all, because they are relative to levels in an untreated control which are not known. The cluster analysis is one way of gaining insight into biological function revealed by transcriptional profile, but it can hide interesting additional complexities. For example, rpoS is named as one of the transcription-associated genes that are higher in the high accumulator subpopulation and evidence of generally increased activity. But RpoS is the stress sigma factor that drives much lower levels of expression generally than the housekeeping sigma factor RpoD, even though it recognises many of the same promoters (and some additional stress-specific promoters). Therefore, increased RpoS occupancy of RNAP would be expected to result in overall lower levels of transcription. However, it is also true that the transcript level for the rpoS gene is a particularly poor indicator of expression - rpoS is largely post-transcriptionally regulated. More generally, annotations are always evolving and key functional insights related to each gene might change in the future, so the results are a more durable resource if they are presented in a less analysed form as well as showing the analysis steps. It can also be important to know which genes were robustly expressed but did not change, versus genes that were not detected.

      Sequencing data associated with this study have now been uploaded and linked under NCBI BioProject accession number PRJNA1096674 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1096674).

      We have added this link to the methods under subheading “Accession Numbers” on lines 858-860. Additionally, transcripts per million counts for each gene for each sample have been added to the Figure 3 - Source Data file as requested by the reviewer.

      (6) In the introduction, the susceptibility of AMP efficacy to resistance mechanisms is discussed:

      "However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance, with polymyxin-B being a notable exception 7, 8. Moreover, mobile resistance genes against AMPs are relatively rare, and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria9, again with plasmid-transmitted polymyxin resistance being a notable exception."

      It seems worth pointing out that polymixins are the only AMPs that can reasonably be compared with small molecule antibiotics in terms of resistance acquisition since they are the only AMPs that have been widely used as drugs and therefore had similar chances to select for resistance among diverse global microbial populations.

      We have now clarified that we are referring to laboratory evolutionary analyses of resistance towards small molecule antibiotics and AMPs (Spohn et al., 2019) and that polymyxins are the only AMPs that have been used in antibiotic treatment to date.

      We have added the following statement to address this point:

      “Bacteria have developed genetic resistance to AMPs, including proteolysis by proteases, modifications in membrane charge and fluidity to reduce affinity, and extrusion by AMP transporters. However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance in experimental evolution analyses, with polymyxin-B and CAP18 being notable exceptions [8]. Moreover, mobile resistance genes against AMPs are relatively rare and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria [9]. Plasmid-transmitted polymyxin resistance constitutes a notable exception [10], possibly because polymyxins are the only AMPs that have been in clinical use to date [9].”

      This amendment can be found on lines 57-65.

      (7) In the description of Figure 4, " tachyplesin monotherapy" is mentioned. It is not really appropriate to describe the treatment of a planktonic culture of bacteria in a test tube as a therapy since there is no host that is benefitting.

      We have now replaced “tachyplesin monotherapy” with “tachyplesin treatment”.

      (8) In the discussion, it is stated that " tachyplesin accumulates intracellularly only in bacteria that do not survive tachyplesin exposure" but this is clearly not true. All bacteria accumulate tachyplesin intracellularly initially, but if the bacteria are non-growing during the exposure, some of them are able to reduce their intracellular levels. The fraction of survivors is roughly correlated with the fraction of bacteria that do not maintain high intracellular levels of tachyplesin and that do not stain with propidium iodide, but for any given cell it seems that there is no clear point at which a high intracellular level of tachyplesin means that it will definitely not survive.

      We have now clarified this statement as follows: “We show that after an initial homogeneous tachyplesin accumulation within a stationary phase E. coli population, tachyplesin is retained intracellularly by bacteria that do not survive tachyplesin exposure, whereas tachyplesin is retained only in the membrane of bacteria that survive tachyplesin exposure.”

      This amendment can be found on lines 443-446.

      (9) Also in the discussion: " Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enchanced [sic] survival to antibiotic treatment." This does not really relate to the results here because the bimodal distributions were primarily studied in the absence of growth. In the LB/exponential growth situations where the population was growing but a very small subpopulation of low accumulators was observed, no measurements were made to indicate subpopulation growth rates.

      We have now removed this statement from the manuscript.

      (10) In discussion, L-Ara4N appears to be referred to as both positively charged and negatively charged; this should be clarified.

      We have now clarified that L-Ara4N is positively charged.

      This amendment can be found on line 496.

      (11) Discussion of TF analysis seems to overstate what is supported by the evidence. The correlation of up- and downregulated genes with previously described TF regulons (probably measured in very different conditions) does not really demonstrate TF activity. This could be measured directly with additional experiments but in the absence of those experiments claims about detecting TF activity should probably be avoided. The attempts to directly demonstrate the importance of those transcription factors to the observed accumulation activity were not successful.

      We have now removed from the discussion the previous paragraph related to the TF analysis. We have also modified the results section reported the TF analysis as follows: “Next, we sought to infer transcription factor (TF) activities via differential expression of their known regulatory targets [61]. A total of 126 TFs were inferred to exhibit differential activity between low and high accumulators (Data Set S4). Among the top ten TFs displaying higher inferred activity in low accumulators compared to high accumulators, four regulate transport systems, i.e. Nac, EvgA, Cra, and NtrC (Figure S12). However, further experiments should be carried out to directly measure the activity of these TFs.”

      Finally, we have also moved the TFs’ data from Figure 3 to Figure S12 in the Supplementary information.

      These amendments can be found on lines 288-293.

      (12) When discussing the possibility of nutrient supplementation versus efflux inhibition as a potential therapeutic strategy, it could be noted that nutrient supplementation cannot be done in many infection contexts. The host immune system and host/bacterial cell density control nutrient access.

      We have now added the following statement: “Moreover, nutrient supplementation as a therapeutic strategy may not be viable in many infection contexts, as host density and the immune system often regulate access to nutrients [3]”.

      These amendments can be found on lines 553-555.

      Reviewer 2:

      (1) Some questions regarding the mechanism remain. One shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´cells. This makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern or if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we have now acknowledged that “tachyplesin-NBD has antibiotic efficacy (see Figure 2) and has an impact on the E. coli transcriptome (Figure 3). Therefore, we cannot conclude whether the transcriptomic differences reported between low and high accumulators of tachyplesin-NBD are causative for the distinct accumulation patterns or if they are a consequence of differential accumulation and downstream phenotypic effects.”

      These amendments can be found on lines 283-287.

      (2) It would be relevant to test and report the MIC of sertraline for the strain tested, particularly since in Figure 4G an initial reduction in CFUs is observed for sertraline treatment, which suggests the existence of biological effects in addition to efflux inhibition.

      We have now measured the MIC of sertraline against E. coli BW25113 finding the MIC value to be 128 μg mL<sup>-1</sup> (418 µM). This value is more than four times higher compared to the sertraline concentration employed in our study, i.e. 30 μg mL<sup>-1</sup> (98 μM).

      These amendments can be found on lines 389-391 and data has been added to Figure 4 – Source Data.

      (3) The role of efflux systems is further supported by the finding that efflux pump inhibitors sensitize E. coli to tachyplesin and prevent the occurrence of the tolerant ´low accumulator´ subpopulations. In principle, this is a great way of validating the role of efflux pumps, but the limited selectivity of these inhibitors (CCCP is an uncoupling agent, and for sertraline direct antimicrobial effects on E. coli have been reported by Bohnert et al.) leaves some ambiguity as to whether the synergistic effect is truly mediated via efflux pump inhibition. To strengthen the mechanistic angle of the work analysis of tachyplesin-NBD accumulation in mutants of the identified efflux components would be interesting.

      We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant (Figure 4B). Considering that the AcrAB-TolC tripartite RND efflux system is known to confer genetic resistance against AMPs like protamine and polymyxin-B [29,30] and that the quorum sensing regulators qseBC might control the expression of acrA [64] , these data further corroborate the hypothesis that low accumulators can efflux tachyplesin and survive treatment with this AMP.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14.

      Moreover, we have also carried out further efflux assays with both ethidium bromide and tachyplesin-NBD to further demonstrate the role of efflux in reduced accumulation of tachyplesin as well as acknowledging that other mechanisms (i.e reduced influx, increased protease activity or increased secretion of OMVs) could play an important role, please see Response 1 to Reviewer 1.

      (4) The authors imply that protease could contribute to the low accumulator mechanism. Proteases could certainly cleave and thus inactivate AMPs/tachyplesin, but would this effect really lead to a reduction in fluorescence levels since the fluorophore itself would not be affected by proteolytic cleavage?

      We agree with the reviewer that nitrobenzoxadiazole (NBD) might not be cleaved by proteases that inactivate tachyplesin and other AMPs. Therefore, inactivation of tachyplesin by proteases might not affect cellular fluorescence levels unless efflux of NBD is possible following the cleavage of tachyplesin-NBD. We have therefore removed the statement “Conversely, should efflux or proteolytic activities by proteases underpin the functioning of low accumulators, we should observe high initial tachyplesin-NBD fluorescence in the intracellular space of low accumulators followed by a decrease in fluorescence due to efflux or proteolytic degradation.” We have now stated the following: “Low accumulators displayed an upregulation of peptidases and proteases compared to high accumulators, suggesting a potential mechanism for degrading tachyplesin (Table S1 and Data Set S3).”

      These amendments can be found on lines 280-282.

      (5) To facilitate comparison with other literature (e.g. papers on sertraline) it would be helpful to state compound concentrations also as molar concentrations.

      We have now added the molar concentrations alongside all instances where concentrations are stated in μg mL<sup>-1</sup>.

      (6) The authors tested a series of efflux pump inhibitors and found that CCCP and sertraline prevented the generation of the low accumulator subpopulation, whereas other inhibitors did not. An overview and discussion of the known molecular targets and mode of action of the different selected inhibitors could reveal additional insights into the molecular mechanism underlying the synergy with tachyplesin.

      We have now added molecular targets and mode of action of the different inhibitors where known. “Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].” And “Interestingly, M9 containing 30 µg mL<sup>-1</sup> (98 μM) sertraline (Figure 4D and S15C), an antidepressant which inhibits efflux activity of RND pumps, potentially through direct binding to efflux pumps [65] and decreasing the PMF [66], or 50 µg mL<sup>-1</sup> (110 μM) verapamil (Figure S15D), a calcium channel blocker that inhibits MATE transporters [67] by a generally accepted mechanism of PMF generation interference [68,69], was able to prevent the emergence of low accumulators. Furthermore, tachyplesin-NBD cotreatment with sertraline simultaneously increased tachyplesin-NBD accumulation and PI fluorescence levels in individual cells (Figure 4E and F, p-value < 0.0001 and 0.05, respectively). The use of berberine, a natural isoquinoline alkaloid that inhibits MFS transporters [70] and RND pumps [71], potentially by inhibiting conformational changes required for efflux activity [70], and baicalein, a natural flavonoid compound that inhibits ABC [72] and MFS [73,74] transporters, potentially through PMF dissipation [75], prevented the formation of a bimodal distribution of tachyplesin accumulation, however displayed reduction in fluorescence of the whole population (Figure S15E and F). Phenylalanine-arginine beta-naphthylamide (PAbN), a synthetic peptidomimetic compound that inhibits RND pumps [76] through competitive inhibition [77], reserpine, an indole alkaloid that inhibits ABC and MFS transporters, and RND pumps [78], by altering the generation of the PMF [69], and 1-(1-naphthylmethyl)piperazine (NMP), a synthetic piperazine derivative that inhibits RND pumps [79], through non-competitive inhibition [80], did not prevent the emergence of low accumulators (Figure S15G-I).”

      These amendments can be found on lines 337-342 and 367-385.

      (7) Page 8. The term ´medium accumulators´ for a 1:1 mix of low and high accumulators is misleading.

      We have now replaced the term “medium accumulators” with “a 1:1 (v/v) mixture of low and high accumulators”.

      These amendments to the description can be found on lines 238-239.

      (8) Figure 3. It may be more appropriate to rephrase the title of the figure to ´biological processes associated with low tachyplesin accumulation´ (rather than ´facilitate accumulation´). The same applies to the section title on page 8.

      We have amended the title of Figure 3 as requested by the reviewer.

      (9) The fact that the low accumulation phenotype depends on the growth media and conditions and can be prevented by nutrients is highly relevant. I would encourage the authors to consider showing the corresponding data in the main manuscript rather than in the SI.

      We have created a new Figure 5, displaying the impact of the nutritional environment and bacterial growth phase on both tachyplesin-NBD accumulation and efficacy.

      (10) In the discussion the authors state´ Heterogeneous expression of efflux pumps within isogenic bacterial populations has been reported 29,32,33,67-69. However, recent reports have suggested that efflux is not the primary mechanism of antimicrobial resistance within stationary-phase bacteria 31,70.´. In light of the authors´ findings that the response to tachyplesin is induced by exposure and is not pre-selected, could they speculate on why this specific response can be induced in stationary, but not exponential cells? Could there be a combination of pre-existing traits and induced responses at play? Could e.g. the reduced growth rate/metabolism in these cells render these cells less susceptible to the intracellular effects of tachyplesin and slow down the antibiotic efficacy, giving the cells enough time to mount additional protective responses that then lead to the low accumulation phenotype?

      We have now acknowledged that it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.

      “As our accumulation assay did not require the prior selection for phenotypic variants, we have demonstrated that low accumulators emerge subsequent to the initial high accumulation of tachyplesin-NBD, suggesting enhanced efflux as an induced response. However, it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production, and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.”

      This amendment can be found on lines 482-489.

      (11) In the abstract: Is it true that low accumulators ´sequester´ the drug in their membrane? In my understanding ´sequestering´ would imply that low accumulators would bind higher levels of tachyplesin-NBD in their membrane compared to high accumulators (and thereby preventing it from entering the cells). According to Figure 1 J, K, it rather seems that the fluorescent signal around the membrane is also stronger in high accumulators.

      We have now removed the sentence “low accumulators sequester the drug in their membrane” from the abstract. We have instead stated: “These phenotypic variants display enhanced efflux activity to limit intracellular peptide accumulation.”

      These amendments can be found on lines 34-35.

      Reviewer 3:

      (1) The authors' claims about high efflux being the main mechanism of survival are unconvincing, given the current data. There can be several alternative hypotheses that could explain their results, such as lower binding of the AMP, lower rate of internalization, metabolic inactivity, etc. It is unclear how efflux can be important for survival against a peptide that the authors claim binds externally to the cell. The addition of efflux assays would be beneficial for clear interpretations. Given the current data, the authors' claims about efflux being the major mechanism in this resistance are unconvincing (in my humble opinion). Some direct evidence is necessary to confirm the involvement of efflux. The data with CCCP in Figure 4C can only indicate accumulation, not efflux. The authors are encouraged to perform direct efflux assays using known methods (e.g., PMIDs 20606071, 30981730, etc.). Figure 4A: The data does not support the broad claims about efflux. First, if the peptide is accumulated on the outside of the outer membrane, how will efflux help in survival? The dynamics shown in 4A may be due to lower binding, lower entry, or lower efflux. These mechanisms are not dissected here. Second, the heterogeneity can be preexisting or a result of the response to this stress. Either way, whether active efflux or dynamic transcriptomic changes are responsible for these patterns is not clear. Direct efflux assays are crucial to conclude that efflux is a major factor here.

      This important comment is similar in scope to the first comment of reviewer 1 and it is partly due to the fact that we had not clearly explained our efflux assays reported in Figure 4 in the original manuscript. We kindly refer this reviewer to our extensive response 1 to reviewer 1 and corresponding amendments on lines 316-350 and in the new Figure S13 and Figure 4 (reported in the response 1 to reviewer 1 above), where we have now fully addressed this reviewer’s and reviewer 1 concerns, as well as performing new experiments following their important suggestions and the methods described in PMIDs 20606071 suggested by this reviewer.

      (2) The fluorescent imaging experiments can be conducted in the presence of externally added proteases, such as proteinase K, which has multiple cleavage sites on tachyplesin. This would ensure that all the external peptides (both free and bound) are removed. If the signal is still present, it can be concluded that the peptide is present internally. If the peptide is primarily external, the authors need to explain how efflux could help with externally bound peptides. Figure 1J-K: How are the authors sure about the location of the intensity? The peptide can be inside or outside and still give the same signal. To prove that the peptide is inside or outside, a proteolytic cleavage experiment is necessary (proteinase K, Arg-C proteinase, clostripain, etc.).

      We thank the reviewer for this important suggestion.

      We have now performed experiments where stationary phase E. coli was incubated in 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD in M9 for 60 min. Next, cells were pelleted and washed to remove extracellular tachyplesin-NBD and then incubated in either M9 or 20 μg mL<sup>-1</sup> (0.7 μΜ) proteinase K in M9 for 120 min. We found that the fluorescence of low accumulators decreased over time in the presence of proteinase K; in contrast, the fluorescence of high accumulators did not decrease over time in the presence of proteinase K. These data therefore suggest that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      Moreover, confocal microscopy using tachyplesin-NBD along with the membrane dye FM™ 4-64FX further confirmed that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      These amendments can be found on lines 173-179, lines 188-192 and in the new Figures S4 and S6.

      (3) Further genetic experiments are necessary to test whether efflux genes are involved at all. The genetic data presented by the authors in Figure S11 is crucial and should be further extended. The problem with fitting this data to the current hypothesis is as follows: If specific efflux pumps are involved in the resistance mechanism, then single deletions would cause some changes to the resistance phenotype, and the data in Figure S11 would look different. If there is redundancy (as is the case in many efflux phenotypes), the authors may consider performing double deletions on the major RND regulators (for example, evgA and marA). Additionally, the deletion of pump components such as TolC (one of the few OM components) and adaptors (such as acrA/D) might also provide insights. If the peptide is present in the periplasm, then deletions involving outer components would become important.

      This important comment is similar in scope to the third comment of reviewer 2. We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14, please also see our response to comment 3 of reviewer 2.

      (4) Line numbers would have been really helpful. Please mention the size of the peptide (length and spatial) for readers.

      We have now added line numbers to the revised manuscript. The length and molecular weight of tachyplesin-1 have now been added on lines 75.

      (5) Figure S4 is unclear. How were the low accumulators collected? What prompted the low-temperature experiment? The conclusion that it accumulates at the outer membrane is unjustified. Where is the data for high accumulators?

      We have now corrected the results section to state that tachyplesin-NBD accumulates on the cell membranes, rather than at the outer membrane of E. coli cells.

      These amendments can be found on lines 178 and 190.

      We would like to clarify that in Figure S4 we compare the distribution of tachyplesin-NBD single-cell fluorescence at low temperature versus 37 °C across the whole stationary phase E. coli population, we did not collect low accumulators only.

      The low-temperature experiment was prompted by a previous publication paper (Zhou Y et al. 2015: doi: 10.1021/ac504880r. Epub 2015 Mar 24. PMID: 25753586) that showed non-specific adherence of antimicrobials to the bacterial surface occurs at low temperatures and that passive and active transport of antimicrobials across the membrane is significantly diminished. Additionally, there are previous reports that suggest low temperatures inhibit post-binding peptide-lipid interactions, but not the primary binding step (PMID: 16569868; PMCID: PMC1426969; PMID: 3891625; PMCID: PMC262080).

      Therefore, the low-temperature experiment was performed to quantify the fluorescence of cells due to non-specific binding. This quantification allowed us to deduce that fluorescence levels of high accumulators are above the measured non-specific binding fluorescence (measured in the low-temperature experiment for the whole stationary phase E. coli population) is the result of intracellular tachyplesin-NBD accumulation. In contrast, the comparable fluorescence levels between all the cells in the low-temperature experiment and the low accumulator subpopulation at 37 °C suggest that tachyplesin-NBD is predominantly accumulated on the cell membranes of low accumulators instead of intracellularly.

      Please also see our response to comment 2 above for further evidence supporting that tachyplesin-NBD accumulates only on the cell membranes of low accumulators and both on the cell membranes and intracellularly in low accumulators.

      (6) Figure S5: Describe the microfluidic setup briefly. Why did the distribution pattern change (compared to Figure 1A)? Now, there are more high accumulators. Does the peptide get equally distributed between daughter cells?

      We have now added a brief description of the microfluidic setup on lines 182-184.

      The difference in the abundance of low and high accumulators between the microfluidics and flow cytometry measurements is likely due to differences in cell density, i.e. a few cells per channel vs millions of cells in a tube. A second major difference is that tachyplesin-NBD is continuously supplied in the microfluidic device for the entire duration of the experiment, therefore, the extracellular concentration of tachyplesin-NBD does not decrease over time. In contrast, tachyplesin-NBD is added to the tube only at the beginning of the experiment, therefore, the extracellular concentration of tachyplesin-NBD likely decreases in time as it is accumulated by the bacteria. The relative abundance of low and high accumulators changes with the extracellular concentration of tachyplesin-NBD as shown in Figure 1A.

      We have added a sentence to acknowledge this discrepancy on lines 186-187.

      No instances of cell division were observed in stationary phase E. coli in the absence of nutrients in all microfluidics assays. Therefore, we cannot comment on the distribution of tachyplesin-NBD across daughter cells.

      (7) How did the authors conclude this: "tachyplesin accumulation on the bacterial membrane may not be sufficient for bacterial eradication"? It is completely unclear to this reviewer.

      We presented this hypothesis at the end of the section “Tachyplesin accumulates primarily in the membranes of low accumulators” as a link to the following section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication” where we test this hypothesis. For clarity, we have now moved this sentence to the beginning of the section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication”.

      (8) What is meant by membrane accumulation? Outside, inside, periplasm? Where? Figure 2H conclusions are unjustified. Bacterial killing with many antibiotics is associated with membrane damage, which is an aftereffect of direct antibiotic action. How can the authors state that "low accumulators primarily accumulate tachyplesin-NBD on the bacterial membrane, maintaining an intact membrane, strongly contributing to the survival of the bacterial population"? This reviewer could not find justifications for the claims about the location of the accumulation or cells actively maintaining an intact membrane. Also, PI staining reports damage both membranes.

      Based on the experiments that we have carried out after this reviewer’s suggestions, please see response 2 above, it is likely that tachyplesin-NBD is present only on the bacterial surface, i.e. in or on the outer membrane of low accumulators, considering that their fluorescence decreases during treatment with proteinase K. However, to take a more conservative approach we have now written on the cell membranes throughout the manuscript, i.e. either the outer or the inner membrane.

      We have also rephrased the statement reported by the reviewer as follows:

      “Taken together with PI staining data indicating membrane damage caused by high tachyplesin accumulation, these data demonstrate that low accumulators, which primarily accumulate tachyplesin-NBD on the bacterial membranes, maintain membrane integrity and strongly contribute to the survival of the bacterial population in response to tachyplesin treatment.”

      These amendments can be found on lines 228-232.

      (9) Figure 3: The findings about cluster 2 and cluster 4 genes do not correlate logically. If the cells are in a metabolically low active state, how are the cells getting enough energy for active efflux and membrane transport? This scenario is possible, but the authors must confirm the metabolic activity by measuring respiration rates. Also, metabolically less-active cells may import a lower number of peptides to begin with. That also may contribute to cell survival. Additionally, lowered metabolism is a known strategy of antibiotic survival that is distinctly different from efflux-mediated survival.

      Following this reviewer’s comment and comment 2 of reviewer 1, we have now carried out further experiments to estimate the metabolic activity of low and high accumulators. Please see our response to comment 2 of reviewer 1 above.

      (10) Figure S10: How did the authors test their hypothesis that cardiolipin is involved in the binding of the peptide to the membrane? The transcriptome data does not confirm it. Genetic experiments are necessary to confirm this claim.

      We would like to clarify that we have not set out to test the hypothesis that cardiolipin is involved in the binding of tachyplesin-NBD. We have only stated that cardiolipin could bind tachyplesin due to its negative charge. We have now cited two previous studies that suggest that tachyplesin has an increased affinity for lipids mixtures containing either cardiolipin (Edwards et al. ACS Inf Dis 2017) or PG lipids (Matsuzaki et al. BBA 1991), i.e. the main constituents of cardiolipins.

      These amendments can be found on lines 264-267.

      (11) Figure 4B-F: There are several controls missing. For Sertraline treatment, the authors must test that the metabolic profile, transcriptomic changes, or import of the peptide are not responsible for enhanced survival. CCCP will not only abolish efflux but also many other respiration-associated or all other energy-driven processes.

      Figure 4D presents data acquired in efflux assays in the absence of extracellular tachyplesin-NBD. Therefore, altered tachyplesin-NBD import cannot contribute to the lack of formation of the low accumulator subpopulation.

      We have now acknowledged that it is conceivable that increased tachyplesin efficacy is due to metabolic and transcriptomic changes induced by sertraline.

      These amendments can be found on lines 396-397.

      We have also acknowledged that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes.

      These amendments can be found on lines 341-342.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a very well-written paper presenting interesting findings related to the recovery following the end-Permian event in continental settings, from N China. The finding is timely as the topic is actively discussed in the scientific community. The data provides additional insights into the faunal, and partly, floral global recovery following the EPE, adding to the global picture.

      Strengths:

      The conclusions are supported by an impressive amount of sedimentological and paleontological data (mainly trace fossils) and illustrations.

      We thank Reviewer #1 for the positive assessments.

      Weaknesses: [eliminated in revision]

      We thank Reviewer #1.

      Reviewer #2 (Public review):

      Summary:

      The authors made a thorough revision of the manuscript, strengthening the message. They also considered all the comments made by the reviewers and provided appropriate and convincing arguments.

      Strengths:

      The revised manuscript clarifies all the major points raised by the reviewers, and the way the information is presented (in the text, figures and tables) is clear.

      We thank Reviewer #2 for the positive comments on our work.

      Weaknesses:

      The authors provided an appropriate and convincing rebuttal regarding the potential weakness I pointed out in the first review of the manuscript. Therefore, I do not see any major issue in their work.

      Introduction

      (1) P. 2, L. 32: Replace "to migrated" with "to migrate".

      Revised as suggested.

      (2) P. 3, L. 43-44: We recently published a review article on the tetrapod terrestrial record from the Central European Basin, showing that Olenekian tetrapod faunas (and ichnofaunas) were already quite rich and diverse. Article: https://doi.org/10.1016/j.earscirev.2025.105085

      Yes, we have read this paper. This summary is very important for the understanding of the biotic recovery after the PTME, especially in the early stage. We have added the new result in our manuscript.

      (3) P. 3, L. 57: Replace "recovered terrestrial ecosystems in tropical" with "recovered tropical terrestrial ecosystems".

      Revised as suggested.

      Results and Discussion

      (4) P. 6, L. 118: Replace "declined" with "decline".

      Revised as suggested.

      (5) P. 7, L. 131: Replace "microbial" with "microbially".

      Revised as suggested.

      Conclusions

      (6) P. 11, L. 224: Replace "as little as" with "as early as".

      Revised as suggested.

      (7) P. 11, L. 227: Replace "not only results in" with "not only result in".

      Revised as suggested.

      (8) 11, L. 230: Replace "suggesting" with "suggest".

      Revised as suggested.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Guo and colleagues features the documentation and interpretation of three successions of continental to marginal marine deposits spanning the P/T transition and their respective ichnofaunas. Based on these new data inferences concerning end-Permian mass extinction and Triassic recovery in the tropical realm are discussed.

      Strengths:

      The manuscript is well-written and organized and includes a large amount of new lithological and ichnological data that illuminate ecosystem evolution in a time of large-scale transition. The lithological documentations, facies interpretations, and ichnotaxonomic assignments look okay (with a few exceptions).

      We thank Reviewer #3 for the positive assessments.

      Weaknesses:

      Weaknesses: [all eliminated in revision]

      We thank Reviewer #3.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors found that IL-1b signaling is pivotal for hypoxemia development and can modulate NETs formation in LPS+HVV ALI model.  

      Strengths: 

      They used IL1R1 ko mice and proved that IL1R1 is involved in ALI model proving that IL1b signalling leads towards ARDS. In addition, hypothermia reduces this effect, suggesting a therapeutic option.  

      We thank the Reviewer for recognizing the strengths of our study and their positive feedback.

      Weaknesses: 

      (1) IL1R1 binds IL1a and IL1b. What would be the role of IL1a in this scenario? 

      Thank you for asking this question. We have addressed this in our previous paper (Nosaka et al. Front Immunol 2020;11; 207) where we used  anti-IL-1a and IL-1a KO mice (Nosaka et al. Front Immunol 2020;11; 207) in our model and found that neither anti-IL-1a treated mice nor IL-1a KO mice were protected. Thus, IL-1b plays a role in inducing hypoxemia during LPS+HVV but not IL-1a. We will now add this point in our revised manuscript discussion.

      (2) The authors depleted neutrophils using anti-Ly6G. What about MDSCs? Do these latter cells be involved in ARDS and VILI?  

      Anti-Ly6G neutrophils depletion may potentially affect G-MDSCs as well (Blood Adv 2022 Jul 29;7(1):73–86), however, we have not looked directly at G-MDSCs.  If these cells were depleted we would have expected to see an increase in inflammation, which we did not.   Instead, anti-Ly6G treated mice were protected. Thus, we can not comment on any presumed role of G-MDSCs in LPS+HVV induced severe ALI model that we used.  

      (3) The authors found that TH inhibited IL-1β release from macrophages led to less NETs formation and albumin leakage in the alveolar space in their lung injury model. A graphical abstract could be included suggesting a cellular mechanism.  

      Thanks for summarizing our findings and the suggestion. Unfortunately, eLIFE does not publish a graphical abstract.  

      (4) If Macrophages are responsible for IL1b release that via IL1R1 induces NETosis, what happens if you deplete macrophages? what is the role of epithelial cells?  

      Previous studies have found that macrophage depletion is protective in several models of ALI (Eyal. Intensive Care Med. 2007;33:1212–1218., Lindauer.  J Immunol. 2009;183:1419–1426.), and other researchers have found that airway epithelial cells did not contribute to IL-1β secretion (Tang. PLoS ONE. 2012;7:e37689.). We have previously reported that epithelial cells produce IL-18 without LPS priming signal during LPS+HVV (Nosaka et al. Front Immunol 2020;11; 207). Thus, IL-18 is not sufficient to induce Hypoxemia as Saline+HVV treated mice do not develop hypoxemia (Nosaka et al. Front Immunol 2020;11; 207). We will now add this point to the revised discussion of the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Nosaka et al is a comprehensive study exploring the involvement of IL1beta signaling in a 2-hit model of lung injury + ventilation, with a focus on modulation by hypothermia. 

      Strengths: 

      The authors demonstrate quite convincingly that interleukin 1 beta plays a role in the development of ventilator-induced lung injury in this model, and that this role includes the regulation of neutrophil extracellular trap formation. The authors use a variety of in vivo animal-based and in vitro cell culture work, and interventions including global gene knockout, cell-targeted knockout and pharmacological inhibition, which greatly strengthen the ability to make clear biological interpretations. 

      We thank the Reviewer for their positive feedback 

      Weaknesses: 

      A primary point for open discussion is the translatability of the findings to patients. The main model used, one of intratracheal LPS plus mechanical ventilation is well accepted for research exploring the pathogenesis and potential treatments for acute respiratory distress syndrome (ARDS). However, the interpretation may still be open to question - in the model here, animals were exposed to LPS to induce inflammation for only 2 hours, and seemingly displayed no signs of sickness, before the start of ventilation. This would not be typical for the majority of ARDS patients, and whether hypothermia could be effective once substantial injury is already present remains an open question. The interaction between LPS/infection and temperature is also complicated - in humans, LPS (or infection) induces a febrile, hyperthermic response, whereas in mice LPS induces hypothermia (eg. Ganeshan K, Chawla A. Nat Rev Endocrinol. 2017;13:458-465). Given this difference in physiological response, it is therefore unclear whether hypothermia in mice and hypothermia in humans are easily comparable. Finally, the use of only young, male animals such as in the current study has been typical but may be criticised as limiting translatability to people. 

      Therefore while the conclusions of the paper are well supported by the data, and the biological pathways have been impressively explored, questions still remain regarding the ultimate interpretations.  

      We agree with the reviewer that at two hours post LPS, there is only minimal pulmonary inflammation at that time (Dagvadorj et al Immunity 42, 640–653). This is a limitation to the experimental model we used in our study. Additionally, as the reviewer pointed out that LPS induces hyperthermia in human, but it is also well-established that physiological hypothermia occurs in humans with severe infections and sepsis (Baisse. Am J Emerg Med. 2023 Sep: 71: 134-138., Werner.  Am J Emerg Med. 2025 Feb;88:64-78.). Therefore, the difference between human and mouse responses to sepsis or infections may be more nuanced.  Furthermore, it is important to distinguish between physiological hypothermia (just <36°C) and therapeutic hypothermia (typically 32-34°C). We will add to the discussion whether hypothermia serves as a protective response, and the transition from normothermia to hyperthermia could have detrimental effects. We only used young male mice in our study as the Reviewer points out; we will also add this point to the revised discussion as a limitation of our study.

      Recommendations for the authors: 

      (i) With hypothermia, metabolic activity would be expected to be reduced and therefore presumably impact on CO2/pH. These may have an impact on outcomes from ventilation, so could the authors include this data and discuss as appropriate? 

      We have now included these data in Suppl Fig 6.  While we observed significant differences in blood pH and  PaCO<sub>2</sub> in Hypothermia treatment group, these values remained within clinically normal range (PaCO<sub>2</sub> : 35 - 45 mmHg, pH : 7.35 - 7.45). Neither Alkalosis (PaCO<sub>2</sub> < 35 mmHg , pH> 7.45) nor Acidosis (PaCO<sub>2</sub> > 45 mmHg, pH < 7.35) was observed.

      (ii) It is noticeable that there are quite large differences in experimental numbers between groups - typically 7-12, 5-12 in Figure 2. How were these N determined? For example is there a reason why there is apparently N = 8 for BALF neutrophils in the saline + HVV group (Figure 1c) but N = 12 for LPS + HVV group? Did any animals die during any of the protocols for example? 

      We conducted experiments with 4 mice per experiment (2 mice per group x2  or 4 mice per group) for ventilation experiments, and pooled data from 5-6 independent experiments or 3-4 independent experiments, respectively. No mouse mortality was observed (unless otherwise noted). However, in the severe ARDS group, some mice were dehydrated by the endpoint of experiments, preventing blood or BALF collections. As a result sample sizes were unequal in some case. Nevertheless, no data were selectively excluded.

      (iii) Discussion - On page 13 you refer to data involving Cl-amidine administration. This does not seem to be related to any experiments reported in the manuscript. 

      We apology for this mistake and have removed it.

      (iv) Methods - authors state that BALF was obtained after 150 minutes of ventilation, yet the experiments apparently lasted for 180 minutes. Presumably this is an error? 

      We apology for this inconsistency.  We collected blood for measuring blood gas at 30 min and 150 min after ventilation. However, mice were kept on ventilator 30 min longer, and then mice were euthanized and BALF were collected.  Thus, BALF were collected at 180 min, 30 minutes after the final blood draw. We have corrected the methods in revised manuscript.  

      (v) Statistical methods - authors state that sometimes Mann-Whitney U-test was used and sometimes unpaired t-test, presumably reflecting that some data were normally distributed and some were not. Could the authors please describe the tests used to confirm distribution of data. 

      We have clarified which stattistcal methods were used in our revised manuscript. 

      Briefly, Normality within the groups was assessed using the Shapiro-Wilk and KolmogorovSmirnov tests. Three-way ANOVA (Figure 1B; Supplemental Figure 1B-D; Supplemental Figure 6), one-way ANOVA (Supplemental Figure 4D-E; Supplemental Figure 5C), and two-way ANOVA were performed for data with more than two groups, followed by Tukey's post hoc test. Some groups analyzed by two-way ANOVA in Figure 1 and Supplemental Figure 1 failed the normality tests due to zero values (analyte not detected by ELISA) or the relatively small sample size, as samples were distributed across multiple measurements. However, the primary group of interest, LPS+HVV, showed significant differences from other groups with consistently low P-values in most datasets, supporting the decision to retain the ANOVA analyses. For comparisons between two groups, the Mann-Whitney U test was used when one or both groups failed the Shapiro-Wilk normality test, while the unpaired Student's t-test was applied to the remaining normally distributed data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript presents a significant and rigorous investigation into the role of CHMP5 in regulating bone formation and cellular senescence. The study provides compelling evidence that CHMP5 is essential for maintaining endolysosomal function and controlling mitochondrial ROS levels, thereby preventing the senescence of skeletal progenitor cells.

      Strengths:

      The authors demonstrate that the deletion of Chmp5 results in endolysosomal dysfunction, elevated mitochondrial ROS, and ultimately enhanced bone formation through both autonomous and paracrine mechanisms. The innovative use of senolytic drugs to ameliorate musculoskeletal abnormalities in Chmp5-deficient mice is a novel and critical finding, suggesting potential therapeutic strategies for musculoskeletal disorders linked to endolysosomal dysfunction.

      Weaknesses:

      The manuscript requires a deeper discussion or exploration of CHMP5's roles and a more refined analysis of senolytic drug specificity and effects. This would greatly enhance the comprehensiveness and clarity of the manuscript.

      We thank the reviewer for these insightful comments. In the revised manuscript, we have expanded the discussion of the distinct roles of CHMP5 in different cell types. Specifically, we add the following sentences (Lines 433-439 in the combined manuscript):

      “Also, a previous study by Adoro et al. did not detect endolysosomal abnormalities in Chmp5 deficient developmental T cells [1]. Since both osteoclasts and T cells are of hematopoietic origin, and meanwhile osteogenic cells and MEFs, which show endolysosomal abnormalities after CHMP5 deficiency, are of mesenchymal origin, it turns out that the function of CHMP5 in regulating endolysosomal pathway could be cell lineage-specific, which remains clarified in future studies.”

      In addition, we tested another senolytic drug Navitoclax (ABT-263), which is a BCL-2 family inhibitor and induces apoptosis of senescent cells, in Chmp5<sup>Ctsk</sup> mice. Micro-CT analysis showed that ABT-263 could also improve periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Furthermore, we have also discussed the potential off-target effects of senolytic drugs in Chmp5<sup>Ctsk</sup> mice in the revised manuscript. Specifically, we added the following paragraph (Lines 441-451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice, and senolytic treatments are effective in alleviating these skeletal disorders.”

      Reviewer #2 (Public review):

      Summary:

      The authors try to show the importance of CHMP5 for skeletal development.

      Strengths:

      The findings of this manuscript are interesting. The mouse phenotypes are well done and are of interest to a broader (bone) field.

      Weaknesses:

      The mechanistic insights are mediocre, and the cellular senescence aspect poor.

      In total, it has not been shown that there are actual senescent cells that are reduced after D+Qtreatment. These statements need to be scaled back substantially.

      We thank the reviewer for these suggestive comments. We have added additional results to strengthen the senescent phenotypes of Chmp5-deficient skeletal progenitor cells, including significant enrichment of the SAUL_SEN_MAYO geneset (positively correlated with cell senescence) and the KAMMINGA_SENESCENCE geneset (negatively correlated with cell senescence) at the transcriptional level by GSEA analysis of RNA-seq data (Fig. S3C), and the increase of γH2Ax<sup>+</sup>;GFP<sup>+</sup> cells at periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice vs. the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). These results further advocate for the senescent phenotypes of Chmp5-deficient skeletal progenitors.

      Furthermore, the combination of Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> vs. wildtype periskeletal progenitors in ex vivo culture (Fig. 5A), suggesting their effectiveness in targeting periskeletal progenitor cell senescence in Chmp5<sup>Ctsk</sup> mice. Furthermore, we tested an alternative senolytic drug ABT-263, which is an inhibitor of the BCL-2 family and induces apoptosis of senescent cells, in Chmp5<sup>Ctsk</sup> mice, and ABT-263 could also alleviate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results demonstrate that osteogenic cell senescence is responsible for abnormal bone overgrowth in Chmp5-deficient mice and that senolytic drugs are effective in improving these skeletal disorders.

      Reviewer #3 (Public review):

      Summary:

      In this study, Zhang et al. reported that CHMP5 restricts bone formation by controlling endolysosomemitochondrion-mediated cell senescence. The effects of CHMP5 on osteoclastic bone resorption and bone turnover have been reported previously (PMID: 26195726), in which study the aberrant bone phenotype was observed in the CHMP5-ctsk-CKO mouse model, using the same mouse model, Zhang et al., report a novel role of CHMP5 on osteogenesis through affecting cell senescence. Overall, it is an interesting study and provides new insights in the field of cell senescence and bone.

      Strengths:

      Analyzed the bone phenotype OF CHMP5-periskeletal progenitor-CKO mouse model and found the novel role of senescent cells on osteogenesis and migration.

      Weaknesses:

      (1) There are a lot of papers that have reported that senescence impairs osteogenesis of skeletal stem cells. In this study, the author claimed that Chmp5 deficiency induces skeletal progenitor cell senescence and enhanced osteogenesis. Can the authors explain the controversial results?

      Different skeletal stem cell populations in time and space have been identified and reported [2-6]. The present study shows that Chmp5 deficiency in periskeletal (Ctsk-Cre) and endosteal (Dmp1-Cre) osteogenic cells causes cell senescence and aberrant bone formation. Although cell senescence during aging can impair the osteogenesis of marrow stromal cells (MSCs), which contributes to diseases with low bone mass such as osteoporosis, aging can also increase heterotopic ossification or mineralization in musculoskeletal soft tissues such as ligaments and tendons [7]. Notably, the abnormal periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to insertion sites of tendons and ligaments on the bone (Fig. 1A and E), consistent with changes during aging. More broadly, aging can also cause abnormal ossification or mineralization in other body tissues, such as the heart valve [8, 9]. These different results reflect an aberrant state of ossification or mineralization in musculoskeletal tissues and throughout the body during aging. Based on the reviewer’s comment, we have discussed these results in the revised manuscript. Specifically, we add the following paragraph (Lines 453-462 in the combined manuscript):

      “Notably, aging is associated with decreased osteogenic capacity in marrow stromal cells, which is related to conditions with low bone mass, such as osteoporosis. Rather, aging is also accompanied by increased ossification or mineralization in musculoskeletal soft tissues, such as tendons and ligaments [7]. In particular, the abnormal periskeletal overgrowth in Chmp5<sup>Ctsk</sup> mice was predominantly mapped to insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could contribute to the aberrant bone growth. These results suggest that skeletal stem/progenitor cells at different sites of musculoskeletal tissues could demonstrate different, even opposite outcomes in osteogenesis, due to cell senescence.”

      (2) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cells in response to Chmp5-KO-induced senescent cells. In addition, the co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      In the present study, the increased proliferation and osteogenesis of CD45-;CD31-;GFP- periskeletal progenitors were shown as paracrine mechanisms of Chmp5-deficient periskeletal progenitors to promote bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Figs. 4F, G, and S4C-E). According to the reviewer’s suggestion, we have carried out the coculture experiment and the coculture of Chmp5<sup>Ctsk</sup> with wild-type skeletal progenitors could promote osteogenesis of wild-type cells (Fig. S4B), which further supports the paracrine effect of Chmp5-deficient periskeletal progenitors.

      In addition, the cause and outcome of cell senescence could be highly heterogeneous, and different causes of cell senescence can cause significantly distinct, even opposite outcomes. Although the coculture experiments of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice are very interesting, these are beyond the scope of the current study.

      (3) Many EVs were secreted from Chmp5-deleted periskeletal progenitors, compared to the rarely detected EVs around WT cells. Since EVs of BMSCs or osteoprogenitors show strong effects of promoting osteogenesis, did the EVs contribute to the enhanced osteogenesis induced by Chmp5defeciency? Author’s response:

      This is an interesting question. Although we did not separately test the effect of EVs from Chmp5-deficient periskeletal progenitors on the osteogenesis of WT skeletal progenitors, the CD45-;CD31-;GFP- skeletal progenitor cells from Chmp5<sup>Ctsk</sup> mice have an increased capacity of osteogenesis compared to corresponding cells from control animals (Figs. 4G and S4D). Also, the coculture of Chmp5-deficient with wild-type skeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). These results suggest that EVs from Chmp5-deficient periskeletal progenitors could promote osteogenesis of neighboring WT skeletal progenitors. The specific functions of EVs of Chmp5-deficient periskeletal progenitors in regulating osteogenesis will be further investigated in future studies.

      (4) EVs secreted from senescent cells propagate senescence and impair osteogenesis, why do EVs secreted from senescent cells induced by Chmp5-defeciency have opposite effects on osteogenesis?

      The question is similar to comments #1 and #3 from this reviewer. First, the manifestations (including the secretory phenotype) and outcomes of cell senescence could be highly heterogeneous depending on inducers, tissue and cell contexts, and other factors such as “time”. Different causes of cell senescence could lead to different manifestations and outcomes, which have been discussed in the manuscript (Lines 381-383). Similarly, as mentioned above, skeletal stem/progenitor cells at different sites of musculoskeletal tissues could also demonstrate distinct, even opposite outcomes, as a result of cell senescence (Line 453-462). Second, CD45-;CD31-;GFP- periskeletal progenitor cells from Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice have an increased capacity of proliferation and osteogenesis compared to corresponding cells from control animals (Figs. 4F, G and S4C-E). Furthermore, the conditioned medium of Chmp5-deficient skeletal progenitors promoted the proliferation of ATDC5 cells (Fig. 4E) and the coculture of Chmp5<sup>Ctsk</sup> and wild-type periskeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). Taken together, these results show paracrine actions of Chmp5-deficient periskeletal progenitors in promoting aberrant bone growth in Chmp5 conditional knockout mice. We also refer the reviewer to our responses to comments #1 and #3.

      (5) The Chmp5-ctsk mice show accelerated aging-related phenotypes, such as hair loss and joint stiffness. Did Ctsk also label cells in hair follicles or joint tissue?

      This is an interesting question. Although we did not check the expression of CHMP5 in hair follicles, which is outside the scope of the present study, the result in Fig. 1E showed the expression of Ctsk in joint ligaments, tendons, and their insertion sites on the bone (Lines 108-111). Notably, the periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to insertion sites of ligaments and tendons on the bone, which have been discussed in the revised manuscript (Lines 456-460).

      (6) Fifteen proteins were found to increase and five proteins to decrease in the cell supernatant of Chmp5<sup>Ctsk</sup> periskeletal progenitors. How about SASP factors in the secretory profile?

      The SASP phenotype and related factors of senescent cells could be highly heterogeneous depending on inducers, cell types, and timing of senescence [10, 11]. Most of the proteins we identified in the secretome analysis have previously been reported in the secretory profile of osteoblasts or involved in the regulation of osteogenesis. Although we were interested in changes in common SASP factors, such as cytokines and chemokines, the experiment did not detect these factors, probably due to their small molecular weights and the technical limitations of the mass-spec analysis. We have clarified this in the revised manuscript. Specifically, we add the following sentences (Lines 258-261):

      “Notably, the secretome analysis did not detect common SASP factors, such as cytokines and chemokines, in the secretory profile of Chmp5<sup>Ctsk</sup> periskeletal progenitors, probably due to their small molecular weights and the technical limitations of the mass-spec analysis.”

      (7) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Whether the effects of D+Q on bone overgrowth is because of the inhibition of bone resorption?

      This is an important question. We have discussed the potential off-target effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice in the revised manuscript. Specifically, we add the following paragraph (Lines 441451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice and senolytic treatments are effective in alleviating these skeletal disorders.”

      (8) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis by affecting cell senescence.

      We thank the reviewer for this suggestion. The current study mainly reports the function of CHMP5 in the regulation of skeletal progenitor cell senescence and osteogenesis. The roles of VPS4A in cell senescence and skeletal biology will be further explored in future studies. We have discussed this in the revised manuscript. Specifically, we add the following sentence (Lines 407-409):

      “The roles of VPS4A in regulating musculoskeletal biology and cell senescence should be further explored in future studies.”

      (9) Cell senescence with markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo.

      According to the reviewer’s suggestion, we have already performed immunostaining of γH2AX and colocalization with GFP in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> and Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> mice. The results showed that there are more γH2AX+;GFP+ cells in the periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control animals. Because the γH2AX staining could stand as one of the critical results supporting the senescent phenotype of Chmp5-deficient periskeletal progenitors. We have added these results to Fig. 3E and put Fig. 3F in the original manuscript into Fig. S3E due to the space limitation in Figure 3. In sum, these results further enrich the senescent manifestations of Chmp5-deficient periskeletal progenitors.

      (10) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors.

      Maybe primary periskeletal progenitor cell is a better choice.

      ATDC5 cells are typically used as a chondrocyte progenitor cell line. However, our previous study showed that ATDC5 cells could also be used as a reasonable cell model for periskeletal progenitors [12], which was mentioned in the manuscript (Lines 202-204). In addition, the results of ATDC5 cells were also verified in primary periskeletal progenitor cells in this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Despite the robust experimental framework and intriguing findings, there are several areas that require further attention to enhance the manuscript's overall quality and clarity:

      (1) The manuscript could benefit from a more in-depth discussion of the tissue-specific roles of CHMP5, particularly in addressing why CHMP5 deficiency results in distinct outcomes in osteogenic cells as opposed to other cell types, such as osteoclasts. Expanding the discussion would greatly enhance the comprehensiveness and clarity of the manuscript.

      Based on the reviewer’s suggestion, we have expanded the discussion of the distinct roles of CHMP5 in different cell types. Specifically, we state (Lines 433-439):

      “Also, a previous study by Adoro et al. did not detect endolysosomal abnormalities in _Chmp5_deficient developmental T cells [1]. Since both osteoclasts and T cells are of hematopoietic origin, and meanwhile osteogenic cells and MEFs, which show endolysosomal abnormalities after CHMP5 deficiency, are of mesenchymal origin, it turns out that the function of CHMP5 in regulating the endolysosomal pathway could be cell lineage-specific, which remains clarified in future studies.”

      (2) Given that Figures 1 and 2 suggest that the absence of Chmp5 (CHMP5Ctsk & CHMP5Dmp1) leads to disordered proliferation or mineralization of bone or osteoblasts, the manuscript should delve deeper into the potential links between these findings and aging-related processes, such as age-associated fibrosis. Providing clearer explanations and discussion on these connections would help present a more cohesive understanding of the results in the context of aging.

      We thank the reviewer for this favorable suggestion. A feature of aging is heterotopic ossification or mineralization in musculoskeletal soft tissues, including tendons and ligaments [7]. Notably, the abnormal periskeletal bone formation in Chmp5<sup>Ctsk</sup> mice in this study was mostly mapped to the insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could be a contributor to periskeletal overgrowth. We have discussed these results in the revised manuscript. Specifically, we add the following paragraph (Lines 453-462):

      “Notably, aging is associated with decreased osteogenic capacity in marrow stromal cells, which is related to conditions with low bone mass, such as osteoporosis. Rather, aging is also accompanied by increased ossification or mineralization in musculoskeletal soft tissues, such as tendons and ligaments [7]. In particular, the abnormal periskeletal overgrowth in Chmp5<sup>Ctsk</sup> mice was predominantly mapped to the insertion sites of tendons and ligaments on the bone (Fig. 1A and E), which is consistent with changes during aging and suggests that mechanical stress at these sites could contribute to the aberrant bone growth. These results suggest that skeletal stem/progenitor cells at different sites of musculoskeletal tissues could demonstrate different, even opposite outcomes in osteogenesis, due to cell senescence.”

      (3) The manuscript would be improved by a more refined analysis in Figures 3 and 5, particularly in relation to the use of senolytic drugs. Furthermore, a detailed discussion of the specificity and potential off-target effects of quercetin and dasatinib treatments in Chmp5-deficient mice would strengthen the therapeutic claims of these drugs.

      In Figure 3, we have added additional experiments and results to strengthen the senescent phenotypes of Chmp5-deficient periskeletal progenitors, including significant enrichment of the SAUL_SEN_MAYO geneset (positively correlated with cell senescence) and the KAMMINGA_SENESCENCE geneset (negatively correlated with cell senescence) at the transcriptional level by GSEA analysis of RNA-seq data (Fig. S3F), and an increase of γH2AX+;GFP+ cells at the site of periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). These results further enrich the senescent molecular manifestations of Chmp5-deficient periskeletal progenitors.

      In Figure 5, we used an alternative senolytic drug ABT-263 to treat Chmp5<sup>Ctsk</sup> mice, and this antisenescence treatment could also alleviate periskeletal bone overgrowth in this mouse model (Fig. 5F). Furthermore, we have also discussed the potential off-target effects of senolytic drugs in Chmp5<sup>Ctsk</sup> mice. Specifically, we add the following paragraph (Lines 441-451):

      “Furthermore, it is unclear whether the effect of senolytic drugs in Chmp5<sup>Ctsk</sup> mice involves targeting osteoclasts other than osteogenic cells, as osteoclast senescence has not yet been evaluated. However, the efficacy of Q + D in targeting osteogenic cells, which is the focus of the current study, was confirmed in Chmp5<sup>Dmp1</sup> mice (Fig. 5C-E). Additionally, Q + D caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> compared to wild-type periskeletal progenitors in ex vivo culture (Fig. 5A), demonstrating the effectiveness of Q + D in targeting osteogenic cells in the Chmp5<sup>Ctsk</sup> model. Furthermore, an alternative senolytic drug ABT-263 could also ameliorate periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice (Fig. 5F). Together, these results confirm that osteogenic cell senescence is responsible for the bone overgrowth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice and senolytic treatments are effective in alleviating these skeletal disorders.”

      (4) The manuscript could be further enhanced by providing more details into how CHMP5 specifically regulates VPS4A protein levels. Notably, this is a central aspect of the paper linking CHMP5 to endolysosomal dysfunction.

      We thank the reviewer for this important suggestion. One of the novel findings of this study is that CHMP5 regulates the protein level of VPS4A without affecting its RNA transcription. The mechanism of CHMP5 in the regulation of VPS4A protein will be reported in a separate study. However, we have discussed the potential mechanism in the manuscript (Lines 399-409). Specifically, we state:

      “However, the mechanism of CHMP5 in the regulation of the VPS4A protein has not yet been studied. Since CHMP5 can recruit the deubiquitinating enzyme USP15 to stabilize IκBα in osteoclasts by suppressing ubiquitination-mediated proteasomal degradation [13], it is also possible that CHMP5 stabilizes the VPS4A protein by recruiting deubiquitinating enzymes and regulating the ubiquitination of VPS4A, which needs to be clarified in future studies. Notably, mutations in the VPS4A gene in humans can cause multisystemic diseases, including musculoskeletal abnormalities [14] (OMIM: 619273), suggesting that normal expression and function of VPS4A are important for musculoskeletal physiology. The roles of VPS4A in regulating musculoskeletal biology and cell senescence should be further explored in future studies.”

      (5) The discussion section could be enriched by more thoroughly integrating the current findings with previous studies on CHMP5, particularly those exploring its role in osteoclast differentiation and NF-κB signaling.

      The comment is similar to comment #1 of this reviewer. We have expanded the discussion of the distinct functions of CHMP5 in osteoclasts and osteogenic cells (Lines 424-439). We also refer the reviewer to our response to comment #1.

      (6) Figure S4 D is incorrectly arranged and should be revised accordingly.

      Sorry for the confusion. We have added additional annotations to make the images clearer. Now it is Fig. S4E in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract A clinical perspective or at least an outline is desirable.

      The clinical importance of the findings of this study in understanding and treating musculoskeletal disorders of lysosomal storage diseases has been highlighted at the end of the abstract (Line 38).

      (2) Introduction Header missing.

      The protein name is BCL2, not Bcl2.

      These have been corrected in the revised manuscript (Lines 41, 66).

      (3) Results

      The mouse phenotype experiments are well done.

      Hmga1, Hmga2, Trp53, Ets1, and Txn1 are no typical senescence-associated genes. How about

      Cdkn2a and Cdkn1a? These could easily be highlighted in Figure 3B.

      Hmga1, Hmga2, Trp53, Ets1, and Txn1 are within the geneset of Reactome Cellular Senescence. Notably, only the protein levels of CDKN2A (p16) and CDKN1A (p21) showed significant changes (Fig. 3D) and the mRNA levels of Cdkn2a and Cdkn1a did not show significant changes according to RNAseq data. We have added the result of Cdkn2a and Cdkn1a mRNA levels to Fig. S3D in the revised manuscript. Also, we add the following sentences in the text (Lines 193-195):

      “However, the mRNA levels of Cdkn2a (p16) and Cdkn1a (p21) did not show significant changes according to the RNA-seq analysis (Fig. S3D).”

      Figure 3C: Which gene set was used for SASP?

      The SASP geneset in Fig. 3C was from the Reactome database. We have clarified this in the figure legend of Fig. 3 in the revised manuscript (Line 1013).

      The symptom "joint stiffness/contracture" could also be due to skeletal abnormalities related to Chmp5Ctsk.

      Joint stiffness/contracture during aging is mainly the result of heterotopic ossification or mineralization in musculoskeletal soft tissues, including ligaments, tendons, joint capsules, and their insertion sites on the bone. Notably, the periskeletal bone overgrowth in Chmp5<sup>Ctsk</sup> mice was mainly mapped to the insertion sites of tendons, ligaments, and joint capsules on the bone, which are consistent with changes during aging. These results have been discussed in the revised manuscript (Lines 456-460).

      Overall, cellular senescence needs at least Cdkn2a and/or Cdkn1a and another marker, i.e. SenMayo or telomere-associated foci or senescence-associated distortion of satellites.

      We have run GSEA with the SenMayo geneset and the result is added in Fig. S3F in the revised manuscript. Also, we ran another geneset KAMMINGA_SENESCENCE which includes genes downregulated in cell senescence. Both genesets are significantly enriched in Chmp5-deficient periskeletal progenitors based on RNA-seq data (Fig. S3F).

      In addition, we also performed immunostaining for another senescence marker γH2AX and the results showed that there are more γH2AX+;GFP+ cells in periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control animals (Fig. 3E).

      Together, these results further support the senescent phenotypes of Chmp5-deficient periskeletal progenitors.

      For Figure 4A: What is the NES?

      The value of NES has been added in Fig. 4A.

      The existence of vesicles does not necessarily indicate more SASP. Author’s response:

      We agree with the reviewer that the secretion of extracellular vesicles is not directly correlated with the SASP. In this study, the increased secretory vesicles around Chmp5<sup>Ctsk</sup> periskeletal progenitors represent a secretory phenotype of Chmp5-deficient periskeletal progenitors and have paracrine effects in the abnormal bone growth in Chmp5 conditional knockout mice as shown in Figs. 4 and S4.

      The Chmp5-deficient cells COULD promote the proliferation and osteogenesis of other progenitors, but they might as well not. And if this is through the SASP, is completely unresolved.

      CD45<sup>-</sup>;CD31<sup>-</sup>;GFP<sup>-</sup> periskeletal progenitor cells from Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice showed an increased capacity of proliferation and osteogenesis compared to the corresponding cells from control animals (Figs. 4F, G, and S4C-E). Also, the conditioned medium of Chmp5-deficient skeletal progenitors promoted the proliferation of ATDC5 cells (Fig. 4E). In addition, the coculture of Chmp5<sup>Ctsk</sup> and wild-type periskeletal progenitors could enhance the osteogenesis of wild-type cells (Fig. S4B). These results demonstrate the paracrine actions of Chmp5-deficient periskeletal progenitors in promoting aberrant bone growth in Chmp5<sup>Ctsk</sup> and Chmp5<sup>Dmp1</sup> mice. However, factors that mediate the paracrine effects of Chmp5-deficient periskeletal progenitors remain further clarified in future studies.

      This has been mentioned in the revised manuscript (Lines 263-265).

      Figure 5C: The time points are not labelled.

      The time point of 16 weeks was mentioned in the Method section and now it has been added in the legend of Fig. 5C (Line 1063).

      Figure B: Was the bone's overall thickness quantified?

      In Fig. 5B, bone morphology in Chmp5<sup>Ctsk</sup> mice is irregular and difficult to quantify. Therefore, we did not qualify the overall bone thickness in these animals. However, the thickness of the cortical bone was measured by micro-CT analysis in Chmp5<sup>Dmp1</sup> mice after treatment with Q + D (Fig. 5E). Also, we have added the image of the gross femur thickness of Chmp5<sup>Dmp1</sup> mice before and after treatment with Q + D in Fig. 5E.

      It needs to be demonstrated that the actual cell number was reduced after D+Q treatment.

      The Q + D treatment caused a higher cell apoptotic ratio in Chmp5<sup>Ctsk</sup> vs. wild-type skeletal progenitors in ex vivo culture (Fig. 5A), suggesting its effectiveness in targeting the senescent periskeletal progenitors.

      Figure 7A: What is the NES?

      The value of NES has been added in Fig. 7A.

      Reviewer #3 (Recommendations for the authors):

      (1) The WB analysis should be quantified in the Figure 3D.

      In Fig. 3D, the numbers above the lanes of p16 and p21 are the results of the quantification of the band intensity after normalization by β-Actin, which has been indicated in the Figure legend (Lines 10151017).

      (2) The osteoblast detection should be measured with antibody against osteocalcin.

      This comment did not specify what result the reviewer was referring to. However, most of the experiments in this study were performed in primary skeletal progenitor cells or cell lines. Osteoblasts were not specifically involved in the current study.

      (3) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cell in response to Chmp5-KO induced senescent cells. In addition, co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      This comment is the same as comment #2 in the Public Reviews of this Reviewer. We already carried out the coculture experiment of Chmp5-deficient and wild-type periskeletal progenitors and the result was added in Fig. S4B. We refer the reviewer to our response to comment #2 in the Public Reviews for more details.

      (4) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Is the effect of D+Q on bone overgrowth because of the inhibition of bone resorption?

      This comment is the same as comment #7 in the Public Reviews of this Reviewer, where we already address this question.

      (5) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis through affecting cell senescence.

      This comment is the same as comment #8 in the Public Reviews of this Reviewer. We refer the reviewer to our response to that comment.

      (6) Cell senescence with the markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo.

      This comment is the same as comment #9 in the Public Reviews of this Reviewer. We have performed immunostaining of γH2AX and colocalization with GFP in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice and Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> mice. The results showed that there were more γH2AX+;GFP+ cells at the site of periskeletal overgrowth in Chmp5<sup>Ctsk</sup>;Rosa26<sup>26mTmG/+</sup> mice compared to the periosteum of Chmp5<sup>Ctsk/+</sup>;Rosa26<sup>26mTmG/+</sup> control mice (Fig. 3E). We also refer the reviewer to our response to comment #9 in Public Reviews.

      (7) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors.

      Maybe primary periskeletal progenitor cell is a better choice.

      This comment is the same as comment #10 in the Public Reviews of this Reviewer. Our previous study showed that ATDC5 cells could be used as a reasonable cell model for periskeletal progenitors [12]. Also, most of the results of ATDC5 cells in the current study were verified in primary periskeletal progenitors.

      References

      (1) Adoro S, Park KH, Bettigole SE, Lis R, Shin HR, Seo H, et al. Post-translational control of T cell development by the ESCRT protein CHMP5. Nat Immunol. 2017;18(7):780-90. doi: 10.1038/ni.3764. PubMed PMID: 28553951.

      (2) Kassem M, Bianco P. Skeletal stem cells in space and time. Cell. 2015;160(1-2):17-9. doi: 10.1016/j.cell.2014.12.034. PubMed PMID: 25594172.

      (3) Chan CKF, Gulati GS, Sinha R, Tompkins JV, Lopez M, Carter AC, et al. Identification of the Human Skeletal Stem Cell. Cell. 2018;175(1):43-56 e21. doi: 10.1016/j.cell.2018.07.029. PubMed PMID: 30241615.

      (4) Debnath S, Yallowitz AR, McCormick J, Lalani S, Zhang T, Xu R, et al. Discovery of a periosteal stem cell mediating intramembranous bone formation. Nature. 2018;562(7725):133-9. Epub 20180924. doi: 10.1038/s41586-018-0554-8. PubMed PMID: 30250253; PubMed Central PMCID: PMCPMC6193396.

      (5) Mizuhashi K, Ono W, Matsushita Y, Sakagami N, Takahashi A, Saunders TL, et al. Resting zone of the growth plate houses a unique class of skeletal stem cells. Nature. 2018;563(7730):254-8. doi: 10.1038/s41586-018-0662-5. PubMed PMID: 30401834; PubMed Central PMCID: PMCPMC6251707.

      (6) Zhang F, Wang Y, Zhao Y, Wang M, Zhou B, Zhou B, et al. NFATc1 marks articular cartilage progenitors and negatively determines articular chondrocyte differentiation. Elife. 2023;12. Epub 20230215. doi: 10.7554/eLife.81569. PubMed PMID: 36790146; PubMed Central PMCID: PMCPMC10076019.

      (7) Dai GC, Wang H, Ming Z, Lu PP, Li YJ, Gao YC, et al. Heterotopic mineralization (ossification or calcification) in aged musculoskeletal soft tissues: A new candidate marker for aging. Ageing Res Rev. 2024;95:102215. Epub 20240205. doi: 10.1016/j.arr.2024.102215. PubMed PMID: 38325754.

      (8) Mohler ER, 3rd, Adam LP, McClelland P, Graham L, Hathaway DR. Detection of osteopontin in calcified human aortic valves. Arterioscler Thromb Vasc Biol. 1997;17(3):547-52. doi: 10.1161/01.atv.17.3.547. PubMed PMID: 9102175.

      (9) Mohler ER, 3rd, Gannon F, Reynolds C, Zimmerman R, Keane MG, Kaplan FS. Bone formation and inflammation in cardiac valves. Circulation. 2001;103(11):1522-8. doi: 10.1161/01.cir.103.11.1522. PubMed PMID: 11257079.

      (10) Paramos-de-Carvalho D, Jacinto A, Saude L. The right time for senescence. Elife. 2021;10. Epub 2021/11/11. doi: 10.7554/eLife.72449. PubMed PMID: 34756162; PubMed Central PMCID: PMCPMC8580479.

      (11) Wiley CD, Campisi J. The metabolic roots of senescence: mechanisms and opportunities for intervention. Nat Metab. 2021;3(10):1290-301. Epub 2021/10/20. doi: 10.1038/s42255-021-00483-8. PubMed PMID: 34663974; PubMed Central PMCID: PMCPMC8889622.

      (12) Ge X, Tsang K, He L, Garcia RA, Ermann J, Mizoguchi F, et al. NFAT restricts osteochondroma formation from entheseal progenitors. JCI Insight. 2016;1(4):e86254. doi: 10.1172/jci.insight.86254. PubMed PMID: 27158674; PubMed Central PMCID: PMCPMC4855520.

      (13) Greenblatt MB, Park KH, Oh H, Kim JM, Shin DY, Lee JM, et al. CHMP5 controls bone turnover rates by dampening NF-kappaB activity in osteoclasts. J Exp Med. 2015;212(8):1283-301. Epub 20150720. doi: 10.1084/jem.20150407. PubMed PMID: 26195726; PubMed Central PMCID: PMCPMC4516796.

      (14) Rodger C, Flex E, Allison RJ, Sanchis-Juan A, Hasenahuer MA, Cecchetti S, et al. De Novo VPS4A Mutations Cause Multisystem Disease with Abnormal Neurodevelopment. Am J Hum Genet. 2020;107(6):1129-48. Epub 20201112. doi: 10.1016/j.ajhg.2020.10.012. PubMed PMID: 33186545; PubMed Central PMCID: PMCPMC7820634.

    1. Author response:

      eLife Assessment

      This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to four viral proteins that are of structural and functional importance. The justification of some hypotheses regarding fitness is incomplete, as well as the evidence for the model's predictive power, since it shows little improvement over neutral models in predicting protein evolution.

      We thank for the constructive comments that helped improve our study. Regarding the comment about justification of fitness, we will include in the revised manuscript additional information to support the relevance of modeling protein evolution accounting for protein folding stability. We agree that increasing the parameterization of the developed birth-death model is interesting, if it does not lead to overfitting. The model presented considers the fitness of protein variants to determine their reproductive success through the corresponding birth and death rates, varying among lineages, and it is biologically meaningful and technically correct (Harmon 2019). Following a suggestion of the first reviewer to allow variation of the global birth-death rate among lineages, we will additionally incorporate this aspect into the model and evaluate its performance with the data for the evaluation of the models. The integration of structurally constrained substitution models of protein evolution, as Markov models, into the birth-death process was made following standards approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012) and we will provide more information about it in the revised manuscript. Regarding the predictive power, our study showed good accuracy in predicting the real folding stability of forecasted protein variants. On the other hand, predicting the exact sequences proved to be more challenging, indicating needs in the field of substitution models of molecular evolution. Altogether, we believe our findings provide a significant contribution to the field, as accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Additionally, we implemented the models into a freely available computer framework, with detailed documentation and diverse practical examples.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.

      We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. However, predicting the exact sequences was more challenging. For example, amino acids with similar physicochemical properties can result in similar folding stability while differ in the specific sequence, more accurate substitution models of molecular evolution are required in the field. We consider that forecasting the folding stability of future real proteins is an important advancement in forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify this issue in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birth-death model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later for another model derived from the proposal of the reviewer and that we are now implementing into the framework and applying to the data used for the evaluation of the models), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this alters the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution, as Markov models, is correct following general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We will provide a more detailed description of the model in the revised manuscript.

      Apart from these clarifications about the birth-death model used, we understand the point of the reviewer and following the suggestion we are now incorporating an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we are following the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate varies among lineages. We are now implementing this model into the computer framework and applying it to the data used for the evaluation of the models. Preliminary results, which will be finally presented in the revised manuscript, indicate that this model yields similar predictive accuracy compared to the previous birth-death model. If this is confirmed, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We will present this additional birth-death model and its results in the revised manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).

      The study shows similar performance in predicting the sequences of the forecasted proteins under both the SCS model and the neutral model, but shows differences in predicting the folding stability of the forecasted proteins between these models. Indeed, as explained in the previous answer, the birth-death model accounts for variation in fitness among lineages, leading to differences among lineages in reproductive success. The new birth-death model that we are now implementing, which incorporates variation of the global birth-death rate among lineages, is producing similar preliminary results. In addition to these considerations, it is known that SCS models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability. However, inferring sequences (i.e., ancestral sequences) is considerably more challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much greater than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions among amino acids with similar physicochemical properties can result in protein variants with similar folding stability but different specific amino acid sequences; further work is demanded in the field of substitution models of molecular evolution. We will expand the discussion of this aspect in the revised manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.

      In the present study, we compare the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitutions over time. Therefore, to compare the neutral and SCS models, an evolutionary time is required, in this case it is provided by the birth-death process. The suggestions 1) and 2) cannot be compared without an underlined evolutionary history. However, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in our previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models produced proteins with more realistic folding stability than models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results from the present study where we explore the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant and novel finding, folding stability is fundamental to protein function and has diverse implications. While accurately forecasting the exact sequences would indeed be ideal, this remains a challenging task with current substitution models. In this regard, we will discuss in the revised manuscript the need of developing more accurate substitution models.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.

      This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is provided as an input file and it can be updated to incorporate new structures (see the framework documentation and the practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins to reduce biases), thus incorporating background molecular diversity. This important feature was not sufficiently described in the manuscript, and we will add more details in the revised version. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may impact the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We will include a discussion in the revised manuscript about our perspective on the potential effects of environmental changes on forecasting evolution.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of protein (Goldstein 2013), making it broadly applicable. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birth-death models. Rather, we aim to explore the integration of a standard birth-death model with structurally constrained substitution models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and their combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this biological system. We will include these considerations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and co-authors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We also thank this reviewer for the positive comments on our study. Regarding the predictive power, our results showed good accuracy in predicting the folding stability of the forecasted protein variants. However, predicting the specific sequences of these variants is more challenging. For example, forecasting in amino acids with similar physicochemical properties can result in different sequences but in similar folding stability. We believe that these findings are realistic and interesting as they indicate that while forecasting folding stability is feasible, forecasting the specific sequence evolution is more complex that one could anticipate.

      Strengths:

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses:

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.

      It is known that structurally constrained substitution (SCS) models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability, while inferring sequences (i.e., ancestral sequences) remains considerably more challenging (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much higher than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can result in protein variants with similar folding stability but with different specific amino acid composition. We will expand the discussion of this aspect in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding forecasted variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune response. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic divergence between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate forecasting evolution. We will include these considerations in the revised manuscript.

      Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). We will provide additional information on this aspect in the manuscript.

      Regarding the Omicron dataset, we used 384 curated sequences of the Omicron variant of concern to construct the study dataset and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other timepoints (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. We noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID.

      Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations.

      Next, following the proposal of the reviewer, we will incorporate the analysis of an additional viral dataset (probably influenza following the suggestion of the reviewer) to further assess the generalizability of the method. Still, as previously indicated, not all datasets are suitable for a proper evaluation of forecasting evolution. Factors such as the shape of the fitness landscape and the amount of genetic variation over time can influence the accuracy of predictions. We will present the results of the analysis of the new data in the revised manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study is not focused on investigating the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which is an important evaluation of the method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We will include additional details about the parameters of the homology modeling in the revised version. Indeed, our method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur, and in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We will include this discussion in the revised manuscript.

      Cited references

      Arenas M. 2012. Simulation of Molecular Data under Diverse Evolutionary Scenarios. PLoS Comput Biol 8:e1002495.

      Arenas M, Bastolla U. 2020. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 11:248-257.

      Arenas M, Dos Santos HG, Posada D, Bastolla U. 2013. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020-3028.

      Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. 2016. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution 94:264-270.

      Arenas M, Sanchez-Cobos A, Bastolla U. 2015. Maximum likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution 32:2195-2207.

      Arenas M, Weber CC, Liberles DA, Bastolla U. 2017. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Systematic Biology 66:1054-1064.

      Bordner AJ, Mittelmann HD. 2013. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution 31:736-749.

      Carvajal-Rodriguez A. 2010. Simulation of genes and genomes forward in time. Current Genomics 11:58-61.

      Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17:109-121.

      Echave J, Wilke CO. 2017. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 46:85-103.

      Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. 2022. The evolution of the HIV-1 protease folding stability. Virus Evol 8:veac115.

      Goldstein RA. 2013. Population Size Dependence of Fitness Effect Distribution and Substitution Rate Probed by Biophysical Model of Protein Thermostability. Genome Biol Evol 5:1584-1593.

      Harmon LJ. 2019. Introduction to birth-death models. In. Phylogenetic Comparative Methods. p. https://lukejharmon.github.io/pcm/chapter10_birthdeath/.

      Hoban S, Bertorelle G, Gaggiotti OE. 2012. Computer simulations: tools for population and evolutionary genetics. Nature Reviews Genetics 13:110-122.

      Illergard K, Ardell DH, Elofsson A. 2009. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77:499-508.

      Lässig M, Mustonen V, Walczak AM. 2017. Predicting evolution. Nature Ecology & Evolution 1:0077.

      Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, et al. 2012. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science 21:769-785.

      Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees. Elife 3.

      Olabode AS, Kandathil SM, Lovell SC, Robertson DL. 2017. Adaptive HIV-1 evolutionary trajectories are constrained by protein stability. Virus Evol 3:vex019.

      Pascual-Garcia A, Abia D, Mendez R, Nido GS, Bastolla U. 2010. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 78:181-196.

      Wilke CO. 2012. Bringing molecules back into molecular evolution. PLoS Comput Biol 8:e1002572.

      Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, et al. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265-269.

      Yang Z. 2006. Computational Molecular Evolution. Oxford, England.: Oxford University Press.

    1. Author response:

      Reviewer #1 (Public review):

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion did not allow to highlight the broader significance of our findings. In the revised version of the manuscript, we will explicitly mention the theoretical questions asked and our hypotheses in the introduction, and better compare our results to pre-existing examples from the literature.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified.

      Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim.

      Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      We recognize that the current manuscript places more emphasis on the sympatric Morpho population, and that the analysis and the discussion of the results regarding the allopatric Morpho population were underdeveloped. In the revised version, we plan to address this by (1) developing the rationale behind the male choice experiments performed on the allopatric population. We will argue that intraspecific comparison helps identify the traits involved in mate preference within species (iridescent color and/or wing pattern) and that those results can be compared to the interspecific mate choice results to identify the traits involved in species recognition. To explain the relevance of the comparison with the allopatric population, we will also (2) strengthen expectations on the effect of species interactions on the evolution of traits and mate recognition in sympatric populations vs. allopatric populations.

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that this study neither demonstrates that iridescence contributes to evasive mimicry nor that predation is the driver of the convergence in iridescence. We will tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, although this observation is consistent with the evasive mimicry hypothesis.

      Reviewer #2 (Public review):

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 (https://doi.org/10.1111/eth.13517), which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which aligns with similar concerns raised by Reviewer 1. We will improve the discussion on the different evolutionary forces that could have favored this convergent iridescent signal in sympatry to bring more nuance to the discussion.

      Reviewer #3 (Public review):

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs.

      We would like to thank the reviewer for their constructive feedbacks. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates pointing respectively toward divergence and convergence of Morpho iridescence to conclude on the effect of species interaction in trait evolution. Our study is a first attempt at answering this question, covering few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We will make sure to mention this limitation more clearly in the revised version of our manuscript.

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric population would have made a stronger case arguing in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed by the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We will mention this limitation in the results, and clarify the protocol used to extract the chemical profiles, by mentioning the use of concentration data to generate Figure 5 and the associated statistical tests.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point.

      We will mention in the next version of the manuscript previous predation experiments performed on Morpho and other butterflies showing evidence that birds can be predators for those species. Those observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We will make sure this information is transparent in the revised manuscript.

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      We acknowledge that the lack of discussion on alternative evolutionary forces involved in the evolution of iridescence has been highlighted by all reviewers. We will discuss how environmental factors, genetic factors or the correlation with others traits as explanatory variables might explain the convergent signal of iridescence found in sympatric Morpho species, and not only focus on the putative effect of predation.

      Finally, some of the results are weakly supported by statistics or questionable methodology. Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We will make sure to bring nuance to the interpretation of this graph, and clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis.

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior. This experiment would have benefited from more replicates. We will mention that the conclusions we draw for this experiment are mainly driven by male M. bristowi behavior, and that it is more difficult to test for assortative or disassortative mating in M. theodorus, adding more nuance to our interpretation. We will also make sure to discuss further the effect of wing modification in the discussion.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only give us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We will implement all the comments made by the reviewers in the next version of our manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors have developed SPLASH+, a micro-assembly and biological interpretation framework that expands on their previously published reference-free statistical approach (SPLASH) for sequencing data analysis.

      Thank you for this thorough overview of our work.

      Strengths:

      (1) The methodology developed by the authors seems like a promising approach to overcome many of the challenges posed by reference-based single-cell RNA-seq analysis methods.

      Thank you for your positive comment on the potential of our approach to address the limitations of reference-based methods for scRNA-Seq analysis.

      (2) The analysis of the RNU6 repetitive small nuclear RNA provides a very compelling example of a type of transcript that is very challenging to analyze with standard reference-based methods (e.g., most reads from this gene fail to align with STAR, if I understood the result correctly).

      We thank the reviewer for their positive comment. We agree that the variation in RNU6 detected by SPLASH+ underscores the potential of our reference-free method to make discoveries in cases where reference-based approaches fall short.

      Weaknesses:

      (1) The manuscript presents a number of case studies from very diverse domains of single-cell RNA-seq analysis. As a result, the manuscript has been challenging to review, because it requires domain expertise in centromere biology, RNA splicing, RNA editing, V(D)J transcript diversity, and repeat polymorphisms.

      We appreciate the reviewer’s effort in thoroughly evaluating this manuscript, especially given the broad range of biological domains discussed. Our main goal in presenting a wide range of applications was to highlight the key strength of the SPLASH+ framework: its ability to unify diverse biological discoveries within a single method that operates directly on sequencing reads.

      (2) Although the paper focuses on SmartSeq2 full-length single-cell RNA-seq data analysis, the vast majority of single-cell RNA-seq data that is currently being generated comes from droplet-based methods (e.g., 10x Genomics) that sequence only the 3' or 5' ends of transcripts. As a result, it is unclear if SPLASH+ is also applicable to these types of data.

      We thank the reviewer for this comment. Due to the specific data format of barcoded single-cell sequencing platforms such as 10x Genomics, extending the SPLASH framework to support 10x analysis required engineering a specialized preprocessing tool. We have addressed this in a recent work, which is now available as a preprint (https://doi.org/10.1101/2024.12.24.630263).

      (3) The criteria used for the selection of the 10 'core genes' have not been sufficiently justified.

      We chose these genes as SPLASH+ detected regulated splicing for them in nearly all tissues (18 out of 19)  analyzed in our study (i.e., identifying anchors classified as splicing anchors in those tissues). Our subsequent analysis showed that all these genes are involved in either splicing regulation or histone modification. We will further clarify this selection criterion in the revision. 

      (4) It is currently unclear how the splicing diversity discovered in this paper relates to the concept of noisy splicing (i.e., there are likely many low-frequency transcripts and splice junctions that are unlikely to have a significant functional impact beyond triggering nonsense-mediated decay).

      In our analysis, to ensure sufficient read coverage, we considered significant anchors supported by more than 50 reads and detected in over 10 cells. Additionally, our downstream analyses (including splicing analysis) are based on assembled sequences (compactors) generated through our micro-assembly step. This process effectively acts as a denoising step by filtering out sequences likely caused by sequencing errors or with very low read support. However, we agree that the detected splice variants have not been fully functionally characterized, and further functional experiments may be needed.

      (5) The paper presents only a very superficial discussion of the potential weaknesses of the SPLASH+ method.

      We discussed two potential limitations of SPLASH+ in the Conclusions section: (1) it is not suitable for differential gene expression analysis, and (2) although we provide a framework for interpreting and analyzing SPLASH results, further work is still needed to improve the annotation of calls lacking BLAST matches. We will add more discussion for these in the revision. 

      (6) The cursory mention of metatranscriptome in the conclusion of the paper is confusing, as it might suggest the presence of microbial cells in sterile human tissues (which has recently been discredited in cancer, see e.g. https://www.science.org/content/article/journal-retracts-influential-cancer-microbiome-paper).

      We will remove the mention of metatranscriptome in the revised manuscript.

      Reviewer #2 (Public review):

      The authors extend their SPLASH framework with single-cell RNA-seq in mind, in two ways. First, they introduce "compactors", which are possible paths branching out from an anchor. Second, they introduce a workflow to classify compactors according to the type of biological sequence variation represented (splicing, SNV, etc). They focus on simulated data for fusion detection, and then focus on analyzing the Tabula sapiens Smart-seq2 data, showing extensive results on alternative splicing analysis, VDJ, and repeat elements.

      This is strong work with an impressive array of biological investigations and results for a methods paper. I have various concerns about terminology and comparisons, as follows (in a somewhat arbitrary order, apologies).

      Thank you for this thorough overview of our work and your positive comment on the strength of our work.

      (1) The discussion of the weaknesses of the consensus sequence approach of SPLASH is an odd way to motivate SPLASH+ in my opinion, in that SPLASH is not yet so widely used, so the baseline for SPLASH+ is really standard alignment-based approaches. It is fine to mention consensus sequence issues briefly, but it felt belabored.

      We thank the reviewer and agree that the primary comparison for SPLASH+ is with reference-based methods. However, since SPLASH+ builds upon SPLASH, we also aimed to highlight the limitations of the consensus step in original SPLASH and how SPLASH+ addresses them. To maintain the main focus of the paper on comparison with reference-based methods and biological investigations, this discussion with consensus was provided in a Supplementary Figure. We will shorten this discussion in the revision.

      (2) Regarding compactors reducing alignment cost: the comparison should really be between compactor construction and alignment vs read alignment (and maybe vs modern contig construction algorithms and alignment).

      Since the SPLASH framework is fundamentally reference-free and does not require read alignment, we compared the number of sequence alignments for compactors to the total read alignments required by a reference-based method to show that while compactors are aligned to the reference, the number of alignments needed is still orders of magnitude less than a reference-based approach requiring alignment of all the reads.

      (3) The language around "compactors" is a bit confusing, where the authors sometimes refer to the tree of possibilities from an anchor as a "compactor", and sometimes a compactor is a single branch. Presumably, ideally, compactors should be DAGs, not trees, i.e., they can connect back together. Perhaps the authors could comment on whether this matters/would be a valuable extension.

      We thank the reviewer for their comment. We refer to each generated assembled sequence as “a compactor”, and we attempted to make this clear in the paper. We will review the text further to ensure this definition is clear in the revised version.

      (4) The main oddness of the splicing analysis to me is not using cell-type/state in any way in the statistical testing. This need not be discrete cell types: psiX, for example, tested whether exonic PSI was variable with reference to a continuous gene expression embedding. Intuitively, such transcriptome-wide signal should be valuable for a) improving power and b) distinguishing cell-type intrinsic/"noisy" from cell-type specific splicing variation. A straightforward way of doing this would be pseudobulking cell types. Possibly a more sophisticated hierarchical model could be constructed also.

      We appreciate the reviewer’s concern regarding SPLASH+ not using cell type metadata. SPLASH, which performs the core statistical inference in SPLASH+, is an unsupervised tool specifically designed to make biological discoveries without relying on metadata (such as cell type annotations in scRNA-Seq). This is particularly useful in scRNA-seq, where cell type labels could be missing, imprecise, or may miss important within-cell-type variation. As shown in the paper, even without using metadata, SPLASH+ demonstrated improved performance than both SpliZ and Leafcutter (two metadata-dependent tools) in terms of achieving higher concordance and identifying more differentially spliced genes. Regarding pseudobulking, as has been shown in the SpliZ paper (https://doi.org/10.1038/s41592-022-01400-x), pseudobulking requires multiple pseudobulked replicates per cell type for reliable inference, which is often not feasible in scRNA-seq settings, making such methods statistically suboptimal for single-cell studies. We will add a discussion on pseudobulking in the revision. 

      (5) A secondary weakness is that some informative reads will not be used, for example, unspliced reads aligning to an alterantive exons. This relates to the broader weakness of SPLASH that it is blind to changes in coverage that are not linked to a specific anchor (which should be acknowledged somewhere, maybe in the Discussion). In the deeply sequenced SS2 data, this is likely not an issue, but might be more limiting in sparser data. A related issue is that coverage change indicative of, e.g., alternative TSS or TES (that do not also include a change in splice junction use) will not be detected. In fairness, all these weaknesses are shared by LeafCutter. It would be valuable to have a comparison to a more "traditional" splicing analysis approach (pick your favorite of rMATS, MISO, SUPPA).

      We thank the reviewer for their comment. As noted in the Conclusion, the SPLASH framework is not designed for differential gene expression analysis, which relies on quantifying read coverage. Rather, it focuses on detecting differential sequence diversity arising from mechanisms like alternative splicing or RNA editing. We will clarify this limitation further in the revised Conclusion. 

      Regarding splicing evaluation, we have performed extensive comparisons with two widely used and recent methods—SpliZ and Leafcutter—for both bulk and single-cell splicing analysis. While we appreciate the reviewer’s suggestion to include an additional method, given the current length of the paper and the fact that leafcutter has previously been shown to outperform rMATS, MAJIQ, and Cufflinks2

      (https://www.nature.com/articles/s41588-017-0004-9), we believe the current comparisons provide sufficient support for the evaluation of the splicing detection by SPLASH+.

      (6) "We should note that there is no difference between gene fusions and other RNA variants (e.g., RNA splicing) from a sequence assembly viewpoint". Maybe this is true in an abstract sense, but I don't think it is in reality. AS can produce hundreds of isoforms from the same gene, and be variable across individual cells. Gene fusions are generally less numerous/varied and will be shared across clonal populations, so the complexity is lower. That simplicity is balanced against the challenge that any genes could, in principle, fuse.

      We selected the fusion benchmarking dataset solely to evaluate how well compactors reconstruct sequences. Since our goal was to assess the accuracy of reconstructed compactor sequences, we needed a benchmarking dataset with ground truth sequences, which this dataset provides. We had explained our main reason and purpose for selecting fusion dataset in the text, but we will clarify it further in the revision.

      (7) For the fusion detection assessment, SPLASH+ is given the correct anchor for detection. This feels like cheating since this information wouldn't usually be available. Can the authors motivate this? Are the other methods given comparable information? Also, TPM>100 seems like a very high expression threshold for the assessment.

      We agree with the reviewer that the fusion benchmarking dataset should not be used to assess the entire SPLASH+ framework. In fact, we did not use this dataset to evaluate SPLASH+; it was used exclusively to evaluate the performance of compactors as a standalone module. Specifically, we tested how well compactors can reconstruct fusion sequences when provided with seed sequences corresponding to fusion junctions. This aligns with our expectation from compactors in SPLASH+, that they should correctly reconstruct the sequence context for the detected anchors. As noted in our previous response, since our goal was to assess the accuracy of reconstructed compactor sequences, we required a benchmarking dataset with ground truth sequences, which this dataset provides. We will clarify this further in the revision.

      We appreciate the reviewer’s concern that a TPM of 100 is high. In Figure 1C, we presented the full TPM distribution for fusions missed or detected by compactors. The 100 threshold was an arbitrary benchmark to illustrate the clear difference in TPM profiles between these two sets of fusions. We will clarify this point in the revised manuscript.

      (8) Why are only 3'UTRs considered and not 5'? Is this because the analysis is asymmetric, i.e., only considering upstream anchors and downstream variation? If so, that seems like a limitation: how much additional variation would you find if including the other direction?

      We thank the reviewer for their comment. SPLASH+ can, in principle, detect variation in 5’ UTR regions, as demonstrated by the variations observed in the 5’ UTRs of the genes ANPC16 and ARPC2. If sequence variation exists in the 5′ UTR, SPLASH+ can still detect it by identifying an anchor upstream of the variable region, as it directly parses sequencing reads to find anchors with downstream sequence diversity. Even when the variation occurs near the 5′ end of the 5′ UTR, SPLASH+ can still capture this diversity if the user selects a shorter anchor length.

      (9) I don't find the theoretical results very meaningful. Assuming independent reads (equivalently binomial counts) has been repeatedly shown to be a poor assumption in sequencing data, likely due to various biases, including PCR. This has motivated the use of overdispersed distributions such as the negative Binomial and beta binomial. The theory would be valuable if it could say something at a specified level of overdispersion. If not, the caveat of assuming no overdispersion should be clearly stated.

      We appreciate the reviewer’s comment. We will clarify this in the revised paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

      Reviewer #2 (Public review):

      Summary

      This revised manuscript investigates the role and the mechanism by which PDE1 impacts NSCLC progression. They provide evidence to demonstrate that PDE1 binds to m6A reader YTHDF2, in turn, regulating STAT3 signaling pathway through its interaction, promoting metastasis and angiogenesis.

      Strength:

      The study uncovers a novel PDE1A/YTHDF2/SOCS2/STAT3 pathway in NSCLC progression and the findings provide a potential treatment strategy for NSCLC patients with metastasis.

      Weakness:

      In discussion, it is stated in the revised version that "the role of YTHDF2 in PDE1A-driven tumor metastasis should be elucidated in future studies", however, given that physical interaction of PDE1A and YTHDF2 plays a critical role in PDE1A-mediated NSCLC metastasis, whether YTHDF2 mimicking the effect of PDE1A in metastasis will strength the manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1A, the y-axis should be "IOD/Area" instead of "IDO/Area".

      Figure 1A was revised as suggested.

      (2) Figure 3A legend for (F) and (G) was switched.

      Figure 3A legend was revised as suggested “(F-G) The mRNA (F) and protein (G) levels of indicated genes were determined in P3 and P0 NSCLC cells.”.

      (3) The statistical analysis should be performed for Figure 3H.

      Figure 3H was revised as suggested.

      (4) Figure 4F, Y-axis has a typo for "vessels" and statistical analysis should be performed on this data.

      Figure 4F was revised as suggested.

      (5) Figure 6 E, typo for "migrated" on the y-axis.

      Figure 6E was revised as suggested.

      (6) Figure 7 C, typos for "expression" on y-aixs in both figures need to be fixed.

      Figure 7C was revised as suggested.

      (7) P-values for Figure 7B need to be stated.

      Figure 7B was revised as suggested.

      (8) m6A should be consistent throughout the manuscript.

      m6A was consistent throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      IKK is the key signaling node for inflammatory signaling. Despite the availability of molecular structures, how the kinase achieves its specificity remains unclear. This paper describes a dynamic sequence of events in which autophosphorylation of a tyrosine near the activate site facilitates phosphorylation of the serine on the substrate via a phosphor-transfer reaction. The proposed mechanism is conceptually novel in several ways, suggesting that the kinase is dual specificity (tyrosine and serine) and that it mediates a phospho-transfer reaction. While bacteria contain phosphorylation-transfer enzymes, this is unheard of for mammalian kinases. However, what the functional significance of this enzymatic activity might remain unaddressed.

      The revised manuscript adequately addresses all the points I suggested in the review of the first submission.

      Response: Authors thank the reviewer for their valuable comments and constructive criticisms for the betterment of the manuscript. We also thank them for appreciating our work. We agree with the reviewer that the functional significance of this particular enzymatic activity of IKK2 is yet to be fully realized. 

      Reviewer #2 (Public review):

      The authors investigate the phosphotransfer capacity of Ser/Thr kinase IkB kinase (IKK), a mediator of cellular inflammation signaling. Canonically, IKK activity is promoted by activation loop phosphorylation at Ser177/Ser181. Active IKK can then unleash NF-kB signaling by phosphorylating repressor IkBα at residues Ser32/Ser26. Noting the reports of other IKK phosphorylation sites, the authors explore the extent of autophosphorylation.

      Semi-phosphorylated IKK purified from Sf9 cells, exhibits the capacity for further autophosphorylation. Anti-phosphotyrosine immunoblotting indicated unexpected tyrosine phosphorylation. Contaminating kinase activity was tested by generating a kinase-dead K44M variant, supporting the notion that the unexpected phosphorylation was IKK-dependent. In addition, the observed phosphotyrosine signal required phosphorylated IKK activation loop serines.

      Two candidate IKK tyrosines were examined as the source of the phosphotyrosine immunoblotting signal. Activation loop residues Tyr169 and Tyr188 were each rendered non-phosphorylatable by mutation to Phe. The Tyr variants decreased both autophosphorylation and phosphotransfer to IkBα. Likewise, Y169F and Y188F IKK2 variants immunoprecipitated from TNFa-stimulated cells also exhibited reduced activity in vitro.

      The authors further focus on Tyr169 phosphorylation, proposing a role as a phospho-sink capable of phosphotransfer to IkBα substrate. This model is reminiscent of the bacterial two-component signaling phosphotransfer from phosphohistidine to aspartate. Efforts are made to phosphorylate IKK2 and remove ATP to assess the capacity for phosphotransfer. Phosphorylation of IkBα is observed after ATP removal, although there are ambiguous requirements for ADP.

      Strengths:

      Ultimately, the authors draw together the lines of evidence for IKK2 phosphotyrosine and ATP-independent phosphotransfer to develop a novel model for IKK2-mediated phosphorylation of IkBα. The model suggests that IKK activation loop Ser phosphorylation primes the kinase for tyrosine autophosphorylation. With the assumption that IKK retains the bound ADP, the phosphotyrosine is conformationally available to relay the phosphate to IkBα substrate. The authors are clearly aware of the high burden of evidence required for this unusual proposed mechanism. Indeed, many possible artifacts (e.g., contaminating kinases or ATP) are anticipated and control experiments are included to address many of these concerns. The analysis hinges on the fidelity of pan-specific phosphotyrosine antibodies, and the authors have probed with two different anti-phosphotyrosine antibody clones. Taken together, the observations are thought-provoking, and I look forward to seeing this model tested in a cellular system.

      Weaknesses:

      Multiple phosphorylated tyrosines in IKK2 were apparently identified by mass spectrometric analyses. LC-MS/MS spectra are presented, but fragments supporting phospho-Y188 and Y325 are difficult to distinguish from noise. It is common to find non-physiological post-translational modifications in over-expressed proteins from recombinant sources. Are these IKK2 phosphotyrosines evident by MS in IKK2 immunoprecipitated from TNFa-stimulated cells? Identifying IKK2 phosphotyrosine sites from cells would be especially helpful in supporting the proposed model.

      Authors thank the reviewer for their elaborate comments and constructive criticisms that helped enrich the manuscript. We also thank them for pointing out the critical points in the model. We agree with the reviewer that testing this model in a cellular system is required to bolster this concept. However, an appropriate cellular assay system to investigate and monitor this mode of phosphotransfer is still elusive. We agree with the reviewer’s concerns on the identification of Y188 and Y325 as potential phosphosites. They have been omitted in the current version and relevant changes have been incorporated. IKK2’s tyrosine phosphorylation status in cells is reported earlier. Although we have not analyzed IKK2 from TNF-a treated cells in this study, a different study of phospho-status of cellular IKK2 indicated tyrosine phosphorylation (Meyer et al 2013).   

      Reviewer #3 (Public review):

      Summary:

      The authors investigate the kinase activity of IKK2, a crucial regulator of inflammatory cell signaling. They describe a novel tyrosine kinase activity of this well-studied enzyme and a highly unusual phosphotransfer from phosphorylated IKK2 onto substrate proteins in the absence of ATP as a substrate.

      Strengths:

      The authors provide an extensive biochemical characterization of the processes with recombinant protein, western blot, autoradiography, protein engineering and provide MS data now.

      Weaknesses:

      The identity and purity of the used proteins has improved in the revised work. Since the findings are so unexpected and potentially of wide-reaching interest - this is important. Similar specific detection of phospho-Ser/Thr vs phospho-Tyr relies largely on antibodies which can have varying degrees of specificity. Using multiple antibodies and MS improves the quality of the data.

      Authors thank the reviewer for their crisp comments and constructive criticisms that helped improve the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Generally, the paper is well written, but the first 4 figures are slow going and could be condensed to show the key points, so that reader gets to Figure 6 and 7 which contain the "meat" of the paper.

      Specific points:

      Several figures should be quantified and experimental reproducibility is not always clear.

      I understand that Figure 3 shows that K44M abolishes both S32/26 phosphorylation and tyrosine phosphorylation, but not PEST region phosphorylation. This suggests that autophosphorylation is reflective of its known specific biological role in signal transduction. But I do not understand why "these results strongly suggest that IKK2-autophosphorylation is critical for its substrate specificity". That statement would be supported by a mutant that no longer autophosphorylates, and as a result shows a loss of substrate specificity, i.e. phosphorylates non-specific residues more strongly. Is that the case? Maybe Darwech et al 2010 or Meyer et al 2013 showed this? Later figures seem to address this point, so maybe this conclusion should be stated later in the paper.

      Page 10: mentions DFG+1 without proper introduction. The Chen et al 2014 paper appears to inform the author's interest in Y169 phosphorylation, or is just an additional interesting finding? Does this publication belong in the Introduction or the Discussion?

      To understand the significance of Figure 4D, we need a WT IKK2 control: or is there prior literature to cite?

      This is relevant for the conclusion that Y169 phosphorylation is particularly important for S32 phosphorylation.

      The cold ATP quenching experiment is nice for testing the model that Y169 functions as a phospho sink that allows for a transfer reaction. However, there is only a single timepoint and condition, which does not allow for a quantitative analysis. Furthermore, a positive control would make this experiment more compelling, and Y169F mutant should show that cold ATP quenching reduces the phosphorylation of IkBa.

      Note after revision: I thank the authors for addressing these points. The manuscript is thereby improved.

      We thank the reviewer for appreciating our efforts in addressing their concerns.

      Reviewer #2 (Recommendations for the authors):

      In the revisions, the authors provide LC-MS/MS spectra for putative phospho-Y325 and phospho-Y188. The details are hard to see at the scale provided, but the fragment ions for pY188 and pY325 peptides are unconvincing. Phospho-Y169, on the other hand, is much more credible. In addition, the revision rebuttal clarifies that Y188 would be packed into a catalytically important core, and Y188F is likely to disrupt the fold. Taken together, it seems doubtful that Y188 is subject to any significant autophosphorylation, and presenting the Y188F data (and discussion) seems like a distraction.

      We agree with the reviewer’s concerns on the identification of Y188 and Y325 as potential phosphosites. They have been omitted in the current version and relevant sections in the manuscript text and figures have been edited.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using high-quality genomic data (long-reads, optical maps, short-reads) and advanced bioinformatic analysis, the authors aimed to document chromosomal rearrangements across a recent radiation (Lake Malawi Cichlids). Working on 11 species, they achieved a high-resolution inversion detection and then investigated how inversions are distributed within populations (using a complementary dataset of short-reads), associated with sex, and shared or fixed among lineages. The history and ancestry of the inversions is also explored.

      On one hand, I am very enthusiastic about the global finding (many inversions well-characterized in a highly diverse group!) and impressed by the amount of work put into this study. On the other hand, I have struggled so much to read the manuscript that I am unsure about how much the data supports some claims. I'm afraid most readers may feel the same and really need a deep reorganisation of the text, figures, and tables. I reckon this is difficult given the complexity brought by different inversions/different species/different datasets but it is highly needed to make this study accessible.

      The methods of comparing optical maps, and looking at inversions at macro-evolutionary scales can be useful for the community. For cichlids, it is a first assessment that will allow further tests about the role of inversions in speciation and ecological specialisation. However, the current version of the manuscript is hardly accessible to non-specialists and the methods are not fully reproducible.

      Strengths:

      (1) Evidence for the presence of inversion is well-supported by optical mapping (very nice analysis and figure!).

      (2) The link between sex determination and inversion in chr 10 in one species is very clearly demonstrated by the proportion in each sex and additional crosses. This section is also the easiest to read in the manuscript and I recommend trying to rewrite other result sections in the same way.

      (3) A new high-quality reference genome is provided for Metriaclima zebra (and possibly other assemblies? - unclear).

      (4) The sample size is great (31 individuals with optical maps if I understand well?).

      (5) Ancestry at those inversions is explored with outgroups.

      (6) Polymorphism for all inversions is quantified using a complementary dataset.

      Weaknesses:

      (1) Lack of clarity in the paper: As it currently reads, it is very hard to follow the different species, ecotypes, samples, inversions, etc. It would be useful to provide a phylogeny explicitly positioning the samples used for assembly and the habitat preference. Then the text would benefit from being organised either by variant or by subgroups rather than by successive steps of analysis.

      We have extensively rewritten the paper to improve the clarity. With respect to this point, we moved Figure 6 to Figure 1, which places the phylogeny of Lake Malawi cichlids at the beginning of the paper. We incorporated information about samples/technologies by ecogroup into this figure to help the reader gain an overview of the technologies involved. We added information about habitat for each ecogroup as well. While we considered a change to the text organization suggested here, we thought it was clearer to keep the original headings.

      (2) Lack of information for reproducibility: I couldn't find clearly the filters and parameters used for the different genomic analyses for example. This is just one example and I think the methods need to be re-worked to be reproducible. Including the codes inside the methods makes it hard to follow, so why not put the scripts in an indexed repository?

      We now provide a link to a github repository (https://github.com/ptmcgrat/CichlidSRSequencing/tree/Kumar_eLife) containing the scripts used for the major analysis in the paper. Because our data is behind a secure Dropbox account, readers will not be able to run the analysis, however, they can see the exact programs, filters, and parameters used for manuscript embedded within each script.

      (3) Further confirmation of inversions and their breakpoints would be valuable. I don't understand why the long-reads (that were available and used for genome assembly) were not also used for SV detection and breakpoint refinement.

      We did use long reads to confirm the presence of the inversions by creating five new genome assemblies from the PacBio HiFi reads: two additional Metriaclima zebra samples and three Aulonocara samples. Alignment of these five genomes to the MZ_GT3 reference is shown in Figures S2 – S7. These genome assemblies were also used to identify the breakpoints of the inversions. However, because of the extensive amount of repetitive DNA at the breakpoints (which is known to be important for the formation of large inversions), our ability to resolve the breakpoints was limited.

      (4) Lack of statistical testing for the hypothesis of introgression: Although cichlids are known for high levels of hybridization, inversions can also remain balanced for a long time. what could allow us to differentiate introgression from incomplete lineage sorting?

      The coalescent time between the inversions between Diplotaxodons and benthics should allow us to distinguish these two mechanisms. Our finding that the genetic distance, which is related to coalescent time, is closer within the inversions than the whole genome is supportive of introgression. However, we did not perform any simulations or statistical tests. We make it clearer in the text that incomplete lineage sorting remains a possible mechanism for the distribution of inversions within these ecogroups.

      (5) The sample size is unclear: possibly 31 for Bionano, 297 for short-reads, how many for long-reads or assemblies? How is this sample size split across species? This would deserve a table.

      We have included this information in the new Figure 1.

      (6) Short read combines several datasets but batch effect is not tested.

      We do not test for batch effect. However, we do note that all of the datasets were analyzed by the same pipeline starting from alignment so batch effects would be restricted to aspects of the reads themselves. Additionally, samples from the different data sets clustered as expected by lineage and inferred inversion, so for these purposes unlikely to have affected analysis.

      (7) It is unclear how ancestry is determined because the synteny with outgroups is not shown.

      Ancestry analysis was determined using the genome alignments of two outgroups from outside of Lake Malawi. This is shown in Figure S8.

      (8) The level of polymorphism for the different inversions is difficult to interpret because it is unclear whether replicated are different species within an eco-group or different individuals from the same species. How could it be that homozygous references are so spread across the PCA? I guess the species-specific polymorphism is stronger than the ancestral order but in such a case, wouldn't it be worth re-doing the PCa on a subset?

      The genomic PCA plots reflect the evolutionary histories that are observed in the whole genome phylogenies. Because the distribution of the inverted alleles violate the species tree, they form separate clusters on the PCA plots that can be used to genotype specific species. We have also performed this analysis on benthics (utaka/shallow benthics/deep benthics) and the distribution matches the expectation.

      Reviewer #2 (Public review):

      Summary:

      Chromosomal inversions have been predicted to play a role in adaptive evolution and speciation because of their ability to "lock" together adaptive alleles in genomic regions of low recombination. In this study, the authors use a combination of cutting-edge genomic methods, including BioNano and PacBio HiFi sequencing, to identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids, a classic example of adaptive radiation and rapid speciation. By examining the frequencies of these inversions present in species from six different linages, the authors show that there is an association between the presence of specific inversions with specific lineages/habitats. Using a combination of phylogenetic analyses and sequencing data, they demonstrate that three of the inversions have been introduced to one lineage via hybridization. Finally, genotyping of wild individuals as well as laboratory crosses suggests that three inversions are associated with XY sex determination systems in a subset of species. The data add to a growing number of systems in which inversions have been associated with adaptation to divergent environments. However, like most of the other recent studies in the field, this study does not go beyond describing the presence of the inversions to demonstrate that the inversions are under sexual or natural selection or that they contribute to adaptation or speciation in this system.

      Strengths:

      All analyses are very well done, and the conclusions about the presence of the six inversions in Lake Malawi cichlids, the frequencies of the inversions in different species, and the presence of three inversions in the benthic lineages due to hybridization are well-supported. Genotyping of 48 individuals resulting from laboratory crosses provides strong support that the chromosome 10 inversion is associated with a sex-determination locus.

      Weaknesses:

      The evidence supporting a role for the chromosome 11 inversion and the chromosome 9 inversion in sex determination is based on relatively few individuals and therefore remains suggestive. The authors are mostly cautious in their interpretations of the data. However, there are a few places where they state that the inversions are favored by selection, but they provide no evidence that this is the case and there is no consideration of alternative hypotheses (i.e. that the inversions might have been fixed via drift).

      We have removed mention of chromosome 9’s potential role in sex determination from the paper. While our analysis of sex association with chromosome 11 was limited compared to our analysis of chromosome 10, it was still statistically significant, and we believe it should be left in the paper. The role of 11 (and 9 and 10) in sex determination was also demonstrated using an independent dataset by Blumer et al (https://doi.org/10.1101/2024.07.28.605452)

      We agree that we did not properly consider alternative hypothesis in the original submission and have rewritten the Discussion substantially to consider various alternative hypothesis.

      Reviewer #3 (Public review):

      This is a very interesting paper bringing truly fascinating insight into the genomic processes underlying the famous adaptive radiation seen in cichlid fishes from Lake Malawi. The authors use structural and sequence information from species belonging to distinct ecotypic categories, representing subclades of the radiation, to document structural variation across the evolutionary tree, infer introgression of inversions among branches of the clade, and even suggest that certain rearrangements constitute new sex-determining loci. The insight is intriguing and is likely to make a substantial contribution to the field and to seed new hypotheses about the ecological processes and adaptive traits involved in this radiation.

      I think the paper could be clarified in its prose, and that the discussion could be more informative regarding the putative roles of the inversions in adaptation to each ecotypic niche. Identifying key, large inversions shared in various ways across the different taxa is really a great step forward. However, the population genomics analysis requires further work to describe and decipher in a more systematic way the evolutionary forces at play and their consequences on the various inversions identified.

      The model of evolution involving multiple inversions putatively linking together co-adapted "cassettes" could be better spelled out since it is not entirely clear how the existing theory on the recruitment of inversions in local adaptation (e.g. Kirkpatrick and Barton) operates on multiple unlinked inversions. How such loci correspond to distinct suites of integrated traits, or not, is not very easy to envision in the current state of the manuscript.

      This is a very interesting point, and we agree creates complications for a simple model of local adaptation. We imagine though that the actual evolutionary history was much more complicated than a single Rhamphochromis-type species separating from a single Diplotaxodon-type species and could have occurred sequentially involving multiple species that are now extinct. A better understanding of the role each of these inversions play in phenotypic diversity could potentially help us determine if different inversions carry variation that could be linked to distinct habit differences. We have added a line to the discussion.

      The role of one inversion in sex determination is apparent and truly intriguing. However, the implication of such locus on ecological adaptation is somewhat puzzling. Also, whether sex determination loci can flow across species via introgression seems quite important as a route to chromosomal sex determination, so this could be discussed further.

      Another very interesting point. If the inversions are involved in ecological adaptation (an important caveat), then potentially the inverted and non-inverted haplotypes play dual roles in the Aulonocara animals with the inverted haplotype carrying adaptive alleles to deep water and the non-inverted haplotype carrying alleles resolving sexual conflict. We have broadened our discussion about their function at the origin including non-adaptive roles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Overall, the paper is well-written and clear. I do have a few suggestions for changes that would help the reader:

      (1) Figure 1: the figure legend could be expanded here to help the reader; what are the blue and yellow lines? Why are there two lines for the GT3a assembly? And, I had to somehow read the legend a few times to understand that the top line is the UMD2a reference assembly, and the next line is the new Bionano map.

      Fixed in what is now Figure 2

      (2) Paragraph starting on line 133: you use the word "test" to refer to the Bionano analyses; it is not clear whether anything is being tested. Perhaps "analyse the maps" or just "map" would be more clear? Or more explanation?

      The text has been modified to address this point

      (3) L145-146: perhaps change "a single inversion" and "a double inversion" to "single inversions" and "double inversions".

      The text has been modified to address this point

      (4) L157: suppression of recombination in inversion heterozygotes is "textbook" material and perhaps does not need a reference. Or, you could reference an empirical paper that demonstrates this point. Though I love the Kirkpatrick and Barton paper, it certainly is not the correct reference for this point.

      The Kirkpatrick reference was incorrectly included here. The correct reference was an empirical demonstration (Conte) that there were regions of suppressed recombination that have been observed in the location of the inversions. We have also moved this reference further up in the sentence to a more appropriate position

      (5) L173: how do you know this is an assembly error and not polymorphism?

      The text has been modified to address this point

      (6) L277(?): "currently growing in the lab" is probably unnecessary.

      The text has been modified to address this point

      (7) L298: "the inversion on 10 acts as an XY sex determiner": the inversion itself is not the sex determination gene; rather, it is linked. I think it would be more precise, here and throughout the paper, to say that these inversions likely harbor the sex determination locus (for example, the wording on lines 369-370 is misleading).

      We agree with the larger point that the inversion might not be causal for sex determination, however, it could still be causal through positional effects. We have modified the text to make it clear that it could also carry the causal locus (or loci).

      (8) Figure 6: overall, this figure is very helpful! However, it contains several problematic statements. In no case do you have evidence that these inversions are "favored by selection"; such statements should be deleted. Also, in point 3, you state that inversions 9, 11, and 20 are transferred to benthic lineages, and then that these inversions are involved in sex determination. But, your data suggests that it is chromosomes 9, 10, and 11 that are linked to sex determination.

      This figure is now Figure 1. We have remove these problematic statements.

      (9) L356-360: I would move the references that are currently at the end of the sentence to line 357 after the statement about the previous work on hybridization. Otherwise, it reads as if these previous papers demonstrated what you have demonstrated in your work.

      The text has been modified to address this point

      (10) Overall, the discussion focuses completely on adaptive explanations for your results, and I would like to see at least an acknowledgement that drift could also be involved unless you have additional data to support adaptive explanations.

      We have rewritten the text to account for the possibility of drift (line 404 and 405).

      Reviewer #3 (Recommendations for the authors):

      The paper utilizes heterogeneous datasets coming from different sources, and it is not always clear which specimens were used to generate structural information (bionano) or sequence information. A diagram summarizing the sequence data, methodologies, and research questions would be beneficial for the reader to navigate in this paper.

      Much of this information has been added to what is now Figure 1. All of this data is also found in Table S2.

      The authors performed genome alignments to analyze and homologize inversion, but this process is not clearly described. For the PCA, SNP information likely involves mapping onto a common reference genome. However, it is not clear how this was achieved given the different species and varying divergence times involved.

      We now include a link to the github that contains the commands that were run. Because the overall level of sequence divergence between cichlid species is quite low (2*10^-3 – Milansky et al), mapping different species onto a common reference is commonly performed in Lake Malawi cichlids.

      The introgression scenario is very intriguing but its role in local adaptation of the ecogroup types is not easy to understand. I understand this is still an outstanding question, but it is unclear how the directionality of introgressions was estimated. This can be substantiated using tree topology analysis, comparative estimates of sequence divergence, and accumulation of DNA insertions. The diagram does not clearly indicate which ones are polymorphic. In some cases, polymorphic inversions could result from the coexistence of native and introgressed haplotypes.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The alternative model of introgression proposed in the cited preprint is interesting and should deserve a formal analysis here. The authors consider unclear what would drive "back" introgressions of non-inverted haplotypes, but this would depend on the selection regimes acting on the inversions themselves, which can include forms of balancing selection and a role for recessive lethals (heterozygote advantage). For instance, a standard haplotype could be favored if it shelters deleterious mutations carried by an inversion. Testing the introgression history over a wider range of branches and directions would provide further insights.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The prose in the paper is occasionally muddled and somewhat unclear. Referring to chromosomes solely by their numbers (e.g.. "inversion on 11") complicates readability.

      This is the standard way to refer to chromosomes in cichlids and we believe while it complicates readability, any other method would be inconsistent with other papers. Changes to nomenclature might improve the readability of this paper, but would make it more difficult to compare results for these chromosomes from other papers with what we have found.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      This paper seeks to address the question of how quantitative trait variation and expression variation are related. scRNAseq represents an appealing approach to eQTL mapping as it is possible to simultaneously genotype individual cells and measure expression in the same cell. As eQTL mapping requires large sample sizes to identify statistical relationships, the use of scRNAseq is likely to dramatically increase the statistical power of such studies. However, there are several technical challenges associated with scRNAseq and the authors' study is focused on addressing those challenges. Most of the points raised by my review of the initial version have been addressed. However, one point remains and one additional point should be considered. In this version the authors have introduced the use of data imputation using a published algorithm, DISCERN. This has greatly increased the variation explained by their model as presented in figure 3. However, it is possible that the explained variance is now an overestimation as a result of using the imputed expression data. I think that it would be appropriate to present figure 3 using the sparse data presented in the initial version of the paper and the newly presented imputed data so that the reader can draw their own conclusions about the interpretation.

      We thank the reviewer for pointing this out and decided to present the results obtained from the sparse data in the main Figure 3 to avoid any overestimation. We also performed the variance partitioning at different sample sizes and used an optimized implementation of the GREML method to be able to handle high sample sizes instead of having to use a bootstrap estimate. As for the benefits of denoising the expression data, we illustrated it in the supplementary figure S6 so that people can draw their own conclusions about this imputation method. The imputation generally increases the contribution of the expressiongenotype interaction and decreases the residuals of the model by up to 8%.

      Reviewer #1:

      Given that the authors overcame many technical and analytical challenges in the course of this research, the study would be greatly strengthened through analysis of at least one, and ideally several, more conditions which would expand the conclusions that could be drawn from the study and demonstrate the power of using scRNAseq to efficiently quantify expression in different environments.

      Our aim was to illustrate the benefit of one-pot scRNA-seq for eQTL mapping and the association of transcriptomic variation to trait variation. We think we have reached this goal with the current study. We understand that performing another scRNA-seq experiment in a new environment would help expand/validate our conclusions, but we think this would be a better fit for a future study. 

      Reviewer #2:

      The authors now say the main take-home for their work is (1) they have established methods for linkage mapping with scRNA-seq and that these (2) "can help gain insights about the genotype-phenotype map at a broader scale." My opinion in this revision is much the same as it was in the first round: I agree that they have met the first goal, and the second theme has been so well explored by other literature that I'm not convinced the authors' results meet the bar for novelty and impact. To my mind, success for this manuscript would be to support the claim that the scRNA-seq approach helps "reveal hidden components of the yeast genotype-to-phenotype map." I'm not sure the authors have achieved this. I agree that the new Figure 3 is a nice addition-a result that apparently hasn't been reported elsewhere (30% of growth trait variation can't be explained by expression). The caveats are that this is a negative result that needs to be interpreted with caution; and that it would be useful for the authors to clarify whether the ability to do this calculation is a product of the scRNA-seq method per se or whether they could have used any bulk eQTL study for it. Beside this, I regret to say that I still find that the results in the revision recapitulate what the bulk eQTL literature has already found, especially for the authors' focal yeast cross: heritability, expression hotspots, the role of cis and transacting variation, etc.

      We agree with the reviewer that this study does not reveal new modes of transcription regulation or phenomena that were not highlighted or hypothesized in the literature. To avoid confusion, we refrained from using the word “reveal” for such cases. However, we provide convincing evidence that one-pot scRNA-seq helps refining our understanding of genotype-phenotype map in two ways. First, the larger scalability of this approach allowed us to find a median number of eQTL per gene that is ~4 times higher than the largest bulk-eQTL mapping in the same genetic background. For 60% of these genes, i.e. the ones with higher expression heritability in our dataset, the ability to explain their transcriptomic variation from SNPs increased by ~16% on average, which is substantial. This gain in power can thus improve our understanding of the gene network by highlighting new downstream effects of mutations or transcriptome variation. Second, by performing one-pot eQTL as opposed to large-scale bulk eQTL, thousands of transcriptomes can be collected simultaneously without having to use batching strategies. This enables the association between phenotype, genotype and expression variation, which we show in figure 3 through variance partitioning. While it is possible that the growth trait variation not being fully explained by expression could be an artifact of scRNA-seq, we do not believe this is the case because most transcriptional variation is explained by genotype (~76%).

      Furthermore, we show that by having to control expression for growth, by missing some hotspots of regulation and by missing multiple eQTL for each gene, previous bulk-eQTL analysis could not replicate the significant association between eQTL hotspots and QTL hotspot, which this study highlights. Thus, we agree in general that many of the insights about transcriptional regulation have been obtained through ‘brute-force’, bulk RNA-seq, which fundamentally can reach tens of thousands of transcriptomes as well, but we believe the one-pot scRNA-seq approach is much easier and expedient once genotyping the single-cells and other challenges regarding denoising and low coverage have been solved (which we believe we did). There is indeed another reviewed preprint [Boocock et al, eLife] that has used similar approaches as our study since the publication of our manuscript (in October 2023).

      Likewise, when in the first round of review I recommended that the authors repeat their analyses on previous bulk RNA-seq data from Albert et al., my point was to lead the authors to a means to provide rigorous, compelling justification for the scRNA-seq approach. The response to reviewers and the text (starting on line 413) says the comparison in its current form doesn't serve this purpose because Albert et al. studied fewer segregants. Wouldn't down-sampling the current data set allow a fair comparison? Again, to my mind what the current manuscript needs is concrete evidence that the scRNA-seq method per se affords truly better insights relative to what has come before.

      We agree that down-sampling the current dataset would allow for a fair comparison. Thus, we illustrate the results of the variance partitioning at different sample sizes. While the total variance explained is similar, the contribution of the genotype-expression interaction increases with sample size, highlighting the increase in the confidence of the associations between expression and genotype that contributed to trait variation. We also showed that a lot of important low-effect sizes eQTL are missing at a sample size of 1000 compared to a sample size 4000. Indeed, by increasing the scale of eQTL mapping by ~4, about 60% of genes have increased heritability and this increase is due to eQTLs that cumulatively explain more than 15% of transcript level variation.

      I also recommend that the authors take care to improve the main text for readability and professionalism. It would benefit from further structural revision throughout (especially in the figure captions) to allow high-impact conclusions to be highlighted and low-impact material to be eliminated. Figure 4 and the results text sections from line 319 onward could be edited for concision or perhaps moved to supplementary if they obscure the authors' case for the scRNA-seq approach. The text could also benefit from copy editing (e.g. three clauses starting with "while" in the paragraph starting on line 456; "od ratio" on line 415). I appreciate the authors' work on the discussion, including posing big picture questions for the field (lines 426-429), but I don't see how they have anything to do with the current scRNA-seq method.

      We thank the reviewer for their suggestions for improving the readability of the text. We edited some of the figure captions and result section titles to better highlight the main results. However, we do not think that the last result section obscures our findings but rather supports the fact that scRNA-seq refines our understanding of the GPM. Indeed, we discovered many new eQTLs that are related to both expression and trait variation, highlighting the potential for understanding the downstream effects of mutations on the gene network and on trait variation through multiple trans-regulation paths.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability and accuracy; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters, which is important to ensure results from future hypothesis testing are meaningful.

      We are happy to hear the positive feedback on our effort to provide an open-source and widely accessible tool for both fast forward simulations and flexible model inversion, applicable across popular models of large-scale brain dynamics.

      Weaknesses:

      Although the tool itself is the main strength of the work, the paper lacked a thorough analysis of issues concerning robustness and benchmarking relative to existing tools.

      The first issue is the robustness to the choice of features to be included in the objective function. This choice significantly affects the training and changes the results, as the authors even acknowledged themselves multiple times (e.g., Page 17 last sentence of first paragraph or Page 19 first sentence of second paragraph). This brings the question of whether the accurate results found in the various demonstrations are due to the biased selection of features (possibly from priors on what worked in previous works). The robustness of the neural estimator and the inference method to noise was also not demonstrated. This is important as most neuroimaging measurements are inherently noisy to various degrees.

      The second issue is on benchmarking. Because the tool developed is, in principle, only a combination of existing tools specific to modeling or Bayesian inference, the work failed to provide a more compelling demonstration of its added value. This could have been demonstrated through appropriate benchmarking relative to existing methodologies, specifically in terms of accuracy and computational efficiency.

      We fully agree with the reviewer that the VBI estimation heavily depends on the choice of data features, and this is the core of the inference procedure, not its weakness. We have demonstrated different scenarios showing how the informativeness of features (commonly used in the literature) results in varying uncertainty quantification. For instance, using summary statistics of functional connectivity (FC) and functional connectivity dynamics (FCD) matrices to estimate global coupling parameter leads to fast convergence; however, it is not sufficient to accurately estimate the whole-brain heterogeneous excitability parameter, which requires features such as statistical moments of time series. VBI provides a taxonomy of data features that users can employ to test their hypotheses. It is important to note that one major advantage of VBI is its ability to make estimation using a battery of data features, rather than relying on a limited set (such as only FC or FCD) as is often the case in the literature. In the revised version, we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. We will also evaluate the robustness of the neural density estimators to (dynamical/additive) noise.

      More importantly, relative to benchmarking, we would like to draw attention to a key point regarding existing tools and methods. The literature often uses optimization for fitting whole-brain network models, and its limitations for reliable causal hypothesis testing have been pointed out in the Introduction/Discussion. As also noted by the reviewer under strengths, and to the best of our knowledge, there are no existing tools other than VBI that can scale and generalize to operate across whole-brain models for Bayesian model inversion. Previously, we developed Hamiltonian Monte Carlo (HMC) sampling for Epileptor model in epilepsy (Hashemi et al., 2020, Jha et al., 2022). This phenomenological model is very well-behaved in terms of numerical integration, gradient calculation, and dynamical system properties (Jirsa et al., 2014). However, this does not directly generalize to other models, particularly the Montbrió model for resting-state, which exhibits bistability with noise driving transitions between states. As shown in Baldy et al., 2024, even at the level of a single neural mass model (i.e., one brain region), gradient-based HMC failed to capture such switching behaviour, particularly when only one state variable (membrane potential) was observed while the other (firing rate) was missing. Our attempts to use other methods (e.g., the second-derivative-based Laplace approximation used in Dynamic Causal Modeling) also failed, due to divergence in gradient calculation. Nevertheless, reparameterization techniques (Baldy et al., 2024) and hybrid algorithms (Gabrié et al., 2022) could offer improvements, although this remains an open problem for these classes of computational models.

      In sum, for oscillatory systems, it has been shown previously that SBI approach used in VBI substantially outperforms both gradient-based and gradient-free alternative methods (Gonçalves et al., 2020, Hashemi et al., 2023, Baldy et al., 2024). Importantly, for bistable systems with switching dynamics, gradient-based methods fail to converge, while gradient-free methods do not scale to the whole-brain level (Hashemi et al., 2020). Hence, the generalizability of VBI relies on the fact that neither the model nor the data features need to be differentiable. We will clarify this point in the revised version. Moreover, we will provide better explanations for some terms mentioned by the reviewer in Recommendations.

      Hashemi, M., Vattikonda, A. N., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., & Jirsa, V. K. (2020). The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217, 116839.

      Jha, J., Hashemi, M., Vattikonda, A. N., Wang, H., & Jirsa, V. (2022). Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. Machine Learning: Science and Technology, 3(3), 035016.

      Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I., & Bernard, C. (2014). On the nature of seizure dynamics. Brain, 137(8), 2210-2230.

      Baldy, N., Breyton, M., Woodman, M. M., Jirsa, V. K., & Hashemi, M. (2024). Inference on the macroscopic dynamics of spiking neurons. Neural Computation, 36(10), 2030-2072.

      Baldy, N., Woodman, M., Jirsa, V., & Hashemi, M. (2024). Dynamic Causal Modeling in Probabilistic Programming Languages. bioRxiv, 2024-11.

      Gabrié, M., Rotskoff, G. M., & Vanden-Eijnden, E. (2022). Adaptive Monte Carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10), e2109420119.

      Gonçalves, P. J., Lueckmann, J. M., Deistler, M., Nonnenmacher, M., Öcal, K., Bassetto, G., ... & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. eLife, 9, e56261.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strengths:

      (1) Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      (2) Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      (3) The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of model. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      (4) In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network if well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification, degeneracy detection, parallelization, and amortization strategy.

      Weaknesses:

      (1) While the posterior estimator was trained with a large quantity of simulated data, the testing/validation is only demonstrated with a single case study (one point in parameter space) per model. This is not sufficient to demonstrate the method's accuracy and reliability, but only its feasibility. Demonstrating the accuracy and reliability of the posterior estimation in large test sets would inspire more confidence.

      (2) The authors have only demonstrated validation of the method using simulated data, but not features derived from actual EEG/MEG or fMRI data. So, it is unclear if the posterior estimator, when applied to real data, would produce results as sensible as using simulated data. Human data can often look quite different from the simulated data, which may be considered out of distribution. Thus, the authors should consider using simulated test data with out-of-distribution parameters to validate the method and using real human data to demonstrate, e.g., the reliability of the method across sessions.

      (3) The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.

      (4) A lot of simulations are required to train the posterior estimator, which seems much more than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. Although once the estimator is well-trained, the parameter inverse given new data will be very fast, it is not clear to me how often such use cases would be encountered. Because the estimator is trained based on an individual connectome, it can only be used to do parameter inversion for the same subject. Typically, we only have one session of resting state data from each participant, while longitudinal resting state data where we can assume the structural connectome remains constant, is rare. Thus, the cost-efficiency and practical utility of training such a posterior estimator remains unclear.

      We agree with the reviewer that it is necessary to show results on larger synthetic test sets, and we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. However, there are some points raised by the reviewer that we need to clarify.

      The validation on empirical data was beyond the scope of this study, as it relates to model validation rather than the inversion algorithms. This is also because we aimed to avoid repetition, given that we have previously demonstrated model validation on empirical data using these techniques, for invasive sEEG (Hashemi et al., 2023), MEG (Sorrentino et al., 2024), EEG (Angiolelli et al., 2025) and fMRI (Lavanga et al., 2024, Rabuffo et al., 2025). Note that if the features of the observed data are not included during training, VBI ignores them, as it requires an invertible mapping function between parameters and data features.

      We have used z-scores and posterior shrinkage to measure prediction performance, as these are Bayesian metrics that take into account the variance of both prior and posterior rather than only the mean value or thresholding for ranking of the prediction used in k-NN or confusion matrix methods. This helps avoid biased accuracy estimation, for instance, if the mean posterior is close to the true value but there is no posterior shrinkage. Although shrinkage is bounded between 0 and 1, we agree that z-scores have no upper bound for such diagnostics.

      Finally, the number of required simulations depends on the dimensionality of the parameter space and the informativeness of the data features. For instance, estimating a single global scaling parameter requires around 100 simulations, whereas estimating whole-brain heterogeneous parameters requires substantially more simulations. Nevertheless, we have provided fast simulations, and one key advantage of VBI is that simulations can be run in parallel (unlike MCMC sampling, which is more limited in this regard). Hence, with commonly accessible CPUs/GPUs, the fast simulations and parallelization capabilities of the VBI tool allow us to run on the order of 1 million simulations within 2–3 days on desktops, or in less than half a day on supercomputers at cohort level, rather than over several years! It has been previously shown that the SBI method used in VBI provides an order-of-magnitude faster inversion than HMC for whole-brain epilepsy spread (Hashemi et al., 2023). Moreover, after training, the amortized strategy is critical for enabling hypothesis testing within seconds to minutes. We agree that longitudinal resting-state data under the assumption of a constant structural connectome is rare; however, this strategy is essential in brain diseases such as epilepsy, where experimental hypothesis testing is prohibitive.

      We will clarify these points and better explain some terms mentioned by the reviewer in the revised manuscript.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Sorrentino, P., Pathak, A., Ziaeemehr, A., Lopez, E. T., Cipriano, L., Romano, A., ... & Hashemi, M. (2024). The virtual multiple sclerosis patient. Iscience, 27(7).

      Angiolelli, M., Depannemaecker, D., Agouram, H., Regis, J., Carron, R., Woodman, M., ... & Sorrentino, P. (2025). The virtual parkinsonian patient. npj Systems Biology and Applications, 11(1), 40.

      Lavanga, M., Stumme, J., Yalcinkaya, B. H., Fousek, J., Jockwitz, C., Sheheitli, H., ... & Jirsa, V. (2023). The virtual aging brain: Causal inference supports interhemispheric dedifferentiation in healthy aging. NeuroImage, 283, 120403.

      Rabuffo, G., Lokossou, H. A., Li, Z., Ziaee-Mehr, A., Hashemi, M., Quilichini, P. P., ... & Bernard, C. (2025). Mapping global brain reconfigurations following local targeted manipulations. Proceedings of the National Academy of Sciences, 122(16), e2405706122.

    1. Author response:

      We thank all three reviewers for providing excellent suggestions that we feel will enhance the clarity and impact of our manuscript. When we submit the revised manuscript, we plan to respond to each comment and provide additional data and discussion points as requested. Below, we include an outline of the main points that we intend to address.

      (1) Reviewers 1 and 2 both suggested investigating degenerative changes in Purkinje cells that are more resistant to age-related loss. We will look for hallmarks of neurodegeneration, such as shrunken dendrites and axonal swellings, in two areas: surviving Purkinje cells adjacent to stripes of cell loss, and the Purkinje cells in aged mice without Purkinje cell loss.

      (2) We agree with Reviewer 2’s point that our manuscript would benefit from discussion of the differences in vulnerability between individual mice.  Therefore, we will elaborate upon possible reasons why some aged mice are more resistant to age-related Purkinje cell loss than others.

      (3) We will take Reviewer 3’s suggestion to perform zebrin II co-staining in our GFP reporter mice, given our findings that calbindin staining can be unreliable in this context. 4) We appreciate Reviewer 3’s comment that quantification would support the observations made in our study. To provide quantitative evidence for our categorization of mice with striped and non-striped Purkinje cell loss, we will measure the gaps (or lack thereof) between Purkinje cell bodies in the anterior zone.

      (4) We will also incorporate several minor but important changes suggested by all three reviewers.

      Thank you to the reviewers and editors for taking the time and effort to review our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents compelling evidence for a novel treatment approach in a challenging patient population with MSS/pMMR mCRC, where traditional immunotherapy has often fallen short. The combination of SBRT and tislelizumab not only yielded a high disease control rate but also indicated significant improvements in the tumor's immune landscape. The safety profile appears favorable, which is crucial for patients who have already undergone multiple lines of therapy.

      Strengths:

      The results underscore the potential of leveraging radiation therapy to enhance the effectiveness of immunotherapy, especially in tumor environments previously deemed hostile to immune interventions. Future research should focus on larger cohorts to validate these findings and explore the underlying mechanisms of immune modulation post-treatment.

      Weaknesses:

      I believe the author's work is commendable and should be considered with some minor modifications:

      (1) While the author categorized patients based on the type of RAS mutation and the location of colorectal cancer metastasis, the article does not adequately address how these classifications influence treatment outcomes. Such as whether KRAS or NRAS mutations, as well as the type of metastatic lesions, affect the sensitivity to gamma-ray treatment and lead to varying responses.

      Thank you very much for your question. Therefore, in the revised manuscript, we added an analysis of the impact of RAS mutation types and different metastatic sites on patient prognosis, but unfortunately, due to the limited number of samples, we were unable to obtain satisfactory results. We also placed the relevant results in the supplementary figure.

      (2) In Figure 2, clarification is needed on how the author differentiated between on-target and off-target lesions. I observed that some images depicted both lesion types at the same level, which could lead to confusion.

      We sincerely apologize for any oversight in our previous submission. To clarify, during the process of radiotherapy planning, we pre-select target lesions at the CT image level, and subsequently define the planning treatment volume (PTV) by marking these pre-selected areas with the 50% isodose lines. In our efficacy evaluation, we distinguish between the target lesions inside the PTV and any lesions outside the target area. In response to your valuable feedback, we have now added the isodose lines for the target lesions to the supplementary figure for greater clarity.

      (3) The author performed only a basic difference analysis. A more comprehensive analysis, including calculations of markers related to treatment efficacy, could offer additional insights for clinical practice.

      To identify potential markers associated with treatment efficacy, we attempted to establish a Cox proportional hazards model and conducted both univariate and multivariate Cox regression analyses. Unfortunately, due to the constraints of sample size and sequencing depth, the analyses did not yield statistically significant results, and we were unable to identify markers that could clearly predict treatment outcomes.

      (4) The transcriptome sequencing analysis provides insights into how stereotactic radiotherapy sensitizes immunotherapy; however, it currently relies on a simple pre- and post-treatment group comparison. It would be beneficial to include additional subgroups to explore more nuanced findings.

      We acknowledge the limitations in the depth of our analysis. In addition to performing differential analysis between the responder group (PR) and the non-responder group (Non-PR), we also conducted differential gene expression analysis on samples before and after treatment. The results revealed a consistent increase in the expression of NOS2 in both groups following Gamma Knife combined with immunotherapy, suggesting that this gene may serve as a potential prognostic factor influencing treatment outcomes. However, given the limited number of studies exploring the role of NOS2 in this context, we recognize that further research is necessary to better understand its involvement and to substantiate its potential as a predictive marker.

      (5) The author briefly discusses the effects of changes in tumor fibrosis and angiogenesis on treatment outcomes. Further experiments may be necessary to validate these findings and investigate the underlying mechanisms of immune regulation following treatment.

      We sincerely appreciate your thoughtful feedback on our results. In response, we conducted additional experiments, including immunohistochemical analysis of patient samples before and after combined treatment. The results demonstrated a reduction in the expression of CD31, a marker of tumor angiogenesis, following the combined treatment. This finding further supports our hypothesis that Gamma Knife treatment, in combination with immunotherapy, may effectively inhibit tumor angiogenesis, contributing to an improved therapeutic outcome.

      Reviewer #2 (Public review):

      Summary:

      This Phase II clinical trial investigates the combination of Gamma Knife Stereotactic Body Radiation Therapy (SBRT) with Tislelizumab for the treatment of metastatic colorectal cancer (mCRC) in patients with proficient mismatch repair (pMMR). The study addresses a critical clinical challenge in the management of pMMR CRC, focusing on the selection of appropriate candidates. The results suggest that the combination of Gamma Knife SBRT and Tislelizumab provides a safe and potent treatment option for patients with pMMR/MSS/MSI-L mCRC who have become refractory to first- and second-line chemotherapy. The study design is rigorous, and the outcomes are promising.

      Advantage:

      The trial design was meticulously structured, and appropriate statistical methods were employed to rigorously analyze the results. Bioinformatics approaches were utilized to further elucidate alterations in the patient's tumor microenvironment and to explore the underlying factors contributing to the observed differences in treatment efficacy. The conclusions drawn from this trial offer valuable insights for managing advanced colorectal cancer in patients who have not responded to first- and second-line therapies.

      Weakness:

      (1) Clarity and Structure of the Abstract<br /> - Results Section: The results section should contain important data, I suggest some important sequencing data should be shown to enhance understanding.

      Thank you for your insightful question. In response, we have revised the content of the article and restructured the abstract to enhance its scientific clarity and make it more accessible to readers.

      (2) As the author using the NanoString assay for transcriptome analysis, more detail should be shown such as the version of R, and the bioinformatics analysis methods.

      We have also addressed the missing details in our research methodology. The revised manuscript now includes a complete description of the research methods, along with the specific software and versions used.

      (3) It is interesting for included patients that PD-L1 increase expression after Gamma Knife Stereotactic Body Radiation Therapy (SBRT) treatment, How to explain it?

      Thank you for your thought-provoking question. PD-L1 plays a crucial role in tumor cell immune evasion, and anti-PD-1/PD-L1 inhibitors have emerged as effective immune checkpoint inhibitors, widely used in cancer therapy. In our clinical trials, we observed an increase in PD-L1 expression in some patients following combined treatment. Existing literature suggests that activation of various carcinogenic and stress response pathways, along with post-transcriptional modifications of PD-L1 (such as phosphorylation, glycosylation, acetylation, ubiquitination, and palmitoylation), can influence its expression[1]. We hypothesize that the increase in PD-L1 expression may be attributed to the activation of specific signaling pathways induced by the radiation from Gamma Knife treatment, as well as the enhanced tumor stress in response to the treatment. However, the precise mechanisms underlying this observation require further experimental investigation. A deeper understanding of these processes could potentially optimize our clinical treatment strategies.

      (4) It would be helpful to include a brief discussion of the limitations of the study, such as sample size constraints and their impact on the generalizability of the results. This will give readers a more comprehensive understanding of the findings.

      Thank you for highlighting the limitations of the article. In response, we have added a detailed discussion of the constraints arising from the limited number of experimental samples and insufficient sequencing depth. This addition aims to provide readers with a clearer understanding of the study's limitations and the context of our research findings.

      (5) Language Accuracy: There are a few instances where wording could be more professional or precise.

      Regarding the language deficiency, we are very sorry that the wording of the professional content in the article is not careful and accurate enough due to the difference in the native language environment. We have checked our article again and revised the wording and grammar in the hope that you and other readers can grasp our research content more accurately.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The research presented in this article is commendable; however, I would like to propose several revisions for consideration:

      Consideration of Concomitant Medications: It is imperative to ascertain whether enrolled patients utilized additional pharmacological agents alongside the trial regimen. Such concurrent drug use could potentially influence the final outcomes. A concise discussion of this aspect is warranted within the manuscript.

      Clinical Characterization of Response Groups: An examination of the clinical characteristics distinguishing the effective and non-responsive cohorts within the trial is essential. This inquiry merits further exploration, as it may elucidate factors influencing treatment efficacy.

      Tumor Microenvironment Analysis: The authors highlight the implications of tumor fibrosis and angiogenesis on therapeutic response. Identification of specific biomarkers associated with these phenotypes is crucial. I recommend undertaking straightforward testing and validation to substantiate these observations.

      Thank you very much for your valuable suggestions, many of which have been incorporated into the revised manuscript. Regarding the consideration of concurrent medication, we would like to clarify that all patients included in the study were advanced CRC patients who had progressed during first- or second-line treatments. As such, targeted therapy or chemotherapy was used concurrently in the trial. Previous studies have not indicated that different targeted therapies influence the efficacy of Gamma Knife treatment, though some chemotherapy agents may vary in their side effects. However, we believe these differences do not significantly impact the final outcomes. Given that existing chemotherapy regimens do not substantially affect patient prognosis, we considered the combined drug treatment regimen to be an irrelevant variable in our analysis.

      Additionally, we have carefully examined the clinical characteristics of patients across different groups. We have also included an analysis of the impact of various mutation types and metastatic sites in the revised manuscript. Furthermore, we plan to perform CD31 staining on lesions from both the responder and non-responder groups before and after Gamma Knife treatment to assess the role of angiogenesis in treatment response.

      Reviewer #2 (Recommendations for the authors):

      The abstract should be revised for greater clarity and include key results that substantiate the conclusions. The discussion section needs to more thoroughly address the limitations of the clinical trial, providing readers with a deeper understanding of the trial's findings and implications. Additionally, the methods section should be more rigorous and detailed, offering sufficient information to enhance the transparency and robustness of the experimental design.

      Thank you for your constructive suggestions regarding the shortcomings in our manuscript. In response, we have thoroughly reviewed the article and addressed the missing content, including revisions to the abstract, results, discussion, and methods sections. Additionally, we have refined the grammar and wording throughout the manuscript to enhance its professionalism and ensure it aligns with the standards expected for publication.

      (1)  YAMAGUCHI H, HSU J M, YANG W H, et al. Mechanisms regulating PD-L1 expression in cancers and associated opportunities for novel small-molecule therapeutics [J]. Nature reviews Clinical oncology, 2022, 19(5): 287-305.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      Comments on latest version:

      The corrected version of the article titled “Ultrastructural sublaminar specific diversity of excitatory synaptic boutons in layer 1 of the adult human temporal lobe neocortex" has been improved thanks to the comments and suggestions of the reviewers. The Authors implemented several of my comments and suggestions. However, many of them were not completed. It is understandable that the Authors did not start a whole new series of experiments investigating inhibitory synapses (as it was a misunderstanding affecting 2 reviewers from the three). But the English text is still very hard to understand and has many mistakes, although I suggested to extensively review the use of English. Furthermore, my suggestion about avoiding many abbreviations in the abstract, analyse and discuss more the perforated synapses, the figure presentation (Figure 3) and including data about the astrocytic coverage in the Results section were not implemented. My questions about the number of docked vesicles and p10 vesicles, as well as about the different categories of the vesicle pools have not been answered neither. Many other minor comments and suggestions were answered, corrected and implemented, but I think it could have been improved more if the Authors take into account all of the reviewers' suggestions, not only some of them. I still have several main and minor concerns, with a few new ones as well I did not realize earlier, but still think it is important.

      We would like to thank the reviewer for the comments.

      - We worked on the English again and tried to improve the language.

      - We avoided to use too many abbreviations in the Abstract and reduced them to a minimum.

      - We included a small paragraph about non-perforated vs. perforated active zones in both the Results and Discussion sections. However, since the majority of active zones in all cortical layers of the human TLN were of the macular type, we concluded that it is not relevant to describe their function in more detail.

      - In Figure 3 A-C we added contour lines to the boutons to make their outlines more visible.

      - We completed the data about the astrocytic coverage in the Results section (see also below).

      - Concerning the vesicle pools please see below.

      Main concerns:

      (1) Epileptic patients:

      As all patients were epileptic, it is not correct to state in the abstract that non-epileptic tissue was investigated. Even if the seizure onset zone was not in the region investigated, seizures usually invade the temporal lobe in TLE. If you can prove that no spiking activity occurred in the sample you investigated and the seizures did not invade that region, then you can write that it is presumably non-epileptic. I would suggest to write “L1 of the human temporal lobe neocortical biopsy tissue". See also Methods lines 608-612. Write only “non-epileptic" or “non-affected" if you verified it with EcoG. If this was the case, please write a few sentences about it in the Methods.

      We rephrased Material and Methods concerning this point and added that patients were monitored with EEG, MRI and multielectrode recordings. In addition, we stated that the epileptic focus was always far away from the neocortical tissue samples. Furthermore, we added a small paragraph that functional studies using the same methodology have shown that neocortical access tissue samples taken from epilepsy surgery do not differ in electrophysiological properties and synaptic physiology when compared with acute slice preparations in experimental animals and we quoted the relevant papers.

      We hope that the reviewer is now convinced that our tissue samples can be regarded as non-affected.

      (2) About the inhibitory/excitatory synapses.

      Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. Now, I do understand that only excitatory synapses were investigated. Although it was written in the title, I did not realized, since all over the manuscript the Authors were writing synapses, and were distinguishing between inhibitory and excitatory synapses in the text and showing numerous excitatory and inhibitory synapses on Figure 2 and discussing inhibitory interneurons in the Discussion as well. Maybe this was the reason why two reviewers out of the three (including myself) thought you investigated both types of synapses but did not differentiated between them. So, please, emphasize in the Abstract (line 40), Introduction (for ex. line 92-97) and the Discussion (line 369) that only excitatory synaptic boutons were investigated.

      As this paper investigated only excitatory synaptic boutons, I think it is irrelevant to write such a long section in the Discussion about inhibitory interneurons and their functions in the L1 of the human temporal lobe neocortex. Same applies to the schematic drawing of the possible wiring of L1 (Figure 7). As no inhibitory interneurons were examined, neither the connection of the different excitatory cells, only the morphology of single synaptic boutons without any reference on their origin, I think this figure does not illustrate the work done in this paper. This could be a figure of a review paper about the human L1, but is inappropriate in this study.

      We followed the reviewer’s suggestion and pointed out explicitly that we only investigated excitatory synaptic boutons. We also changed the Discussion and focused more on circuitry in L1 and the role of CR-cells.

      (3) Perforated synapses

      The findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed” I did not ask the Authors to say that perforated synapses are more efficient. However, based on the literature (for ex. Harris et al, 1992; Carlin and Siekievitz, 1982; Nieto-Sampedro et al., 1982) the presence of perforated synapses is indeed a good sign of synapse division/formation - which in turn might be coupled to synaptic plasticity (Geinisman et al, 1993), increased synaptic activity (Vrensen and Cardozo, 1981), LTP (Geinisman et al, 1991, Harris et al, 2003), pathological axonal sprouting (Frotscher et al, 2006), etc. I think it is worth mentioning this at least in the Discussion.

      We agree with the reviewer and added a small paragraph in the Results section about the two types of AZs in L1 of the human TLN. We pointed out that there are both types, macular non-perforated and perforated AZs, but the majority in all layers were of the non-perforated type. In the Discussion we added some paper pointing out the role of perforated synapses.

      (4) Question about the vesicle pools

      Results, Line 271: Still not understandable, why the RRP was defined as {less than or equal to}10 nm and {less than or equal to}20nm. Why did you use two categories? One would be sufficient (for example {less than or equal to}20nm). Or the vesicles between 10 and 20nm were considered to be part of RRP? In this case there is a typo, it should be {greater than or equal to}10 nm and {less than or equal to}20nm.

      The answer of the Authors was to my question raised: We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      This does not clarify why did you use two categories. Furthermore, I did not receive answer (such as Referee #2) for my question on how could you have 3x as many docked vesicles than vesicles {less than or equal to}10nm. The category {less than or equal to}10nm should also contain the docked vesicles. Or if this is not the case, please, clarify better what were your categories.

      We thank the reviewer for pointing out that mentioning two distance criteria (p10 and p20) to define one physiological entity (RRP) is somewhat confusing and we acknowledge that the initial response to the reviewers falls short of explaining this choice. This is indeed only understandable in the context of the original paper by Sätzler et al. 2002, where these criteria were first introduced. We therefore referenced this publication more prominently in the paragraph in question.

      So to explain this, we first would like to clarify the definition of the two RRP classification criteria used (p10 and p20), which has caused some confusion amongst the reviewers as to which vesicles where included or not:

      - p10 criterion: p£10 nm (SVs have a minimum distance less than or equal to 10 nm from the PreAZ), including ‘docked’ vesicles which have a distance of zero or less (p0)

      - p20 criterion: p£20 nm (SVs have a minimum distance less than or equal to 20 nm from the PreAZ), including vesicles of the p10 criterion.

      As mentioned, these criteria were introduced first in Sätzler et al. 2002 looking at the Calyx of Held synapse. In that paper, we tried to establish a morphological correlate to existing physiological measurements, which included the RRP. As there is no known marker that would allow to discriminate between vesicles that contribute to the RRP anatomically, we looked at existing physiological experiments such as Schneggenburger et al. 1999; Wu and Borst 1999; Sun and Wu 2001 and compared their total numbers to our measurements. As the number of docked vesicles (p0, see above) was on the lower side of these physiological estimates, we also looked at vesicles close to the AZ, which we think could be recruited within a short time (£ 10 msec). Comparing with existing literature, we found that at p20 we get pool sizes comparable to midrange estimates of reported RRP sizes. In order to account for the variability of the observed physiological pool sizes, we reported all three measurements (p0, p10, p20) not only in the original Calyx of Held, but in all subsequent studies of different CNS synapses of our group since then.

      As it remains uncertain if such correlate indeed exists, we therefore followed the suggestion to rephrase RRP and RP to putative RRP and putative RP (see also Rollenhagen et al. 2007). We thank both reviewers for pointing out this omission.

      Concerning the difference between ‘docked’ vesicles and vesicles within the p10 perimeter criterion. First of all, the reviewer is right in saying that the category p10 ({less than or equal to}10nm) should also contain the docked vesicles (see above). The fact to have 3x as many ‘docked’ vesicles in our TEM tomography than in the p10 distance analysis could be partly explained, on the one hand, by a very high variability between patients (as expressed by the high SD, table 1) and, on the other hand, by a high intraindividual synaptic bouton variability. In both sublayers, there is a huge difference in the number of vesicles within the p10 criterion of individual synaptic boutons ranging from 0 to ~40 with a mean value of ~1 to ~4 (calculated per patient), the upper level being close to the values calculated with TEM tomography for the ‘docked’ vesicles.

      (5) Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      In our previous revised version, we had included the values shown in Fig. 6 for both L1 and L4 in the Results section (L4: lines 352 – 355: ‘The findings in L1…’). However, we agree with the reviewer and have now also added the number of patients and synapses investigated (now lines 359 – 365).

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles.

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      We do not entirely agree with the reviewer on this point. As stated in the text, there are structural criteria to identify astrocytic elements (see citations quoted). These golden standard criteria are commonly used also by other well-known groups (DeFelipe and co-workers, Francisco Clasca and co-workers; Michael Frotscher the late and co-workers etc.). However, in a past paper about astrocytic coverage of synaptic complexes in L5 of the human TLN, immunohistochemistry against glutamine synthetase, a key enzyme in astrocytes, was carried out to describe the coverage. This experiment supports our findings in the other cortical layers of the human TLN. As the reviewer might know, immunohistochemistry always led to a reduction in ultrastructural preservation, so we decided not to use immunohistochemistry for the further publications of the other cortical layers. We added a short notice on this in the Material and Methods section.

      (6) Large interindividual differences in the synapse density should be discussed in the Discussion.

      As suggested by the reviewer we have included a sentence in the Discussion that interindividual differences can be either related to differences in age, gender and the use of different methodology as suggested by DeFelipe and co-workers (1999)

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The MS is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance.

      The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required, answered most of my concerns, included additional data sets, and clarified statements where needed.

      My remaining points are:

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections result in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We thank the reviewer for the helpful comments. We followed the recommendation to measure the vesicle diameter using our TEM tomography tilt series, but came to similar results concerning this synaptic parameter. As stated in our Material and Methods section, we only counted (measured) clear ring-link structures according to a paper by Abercrombie (1963). Since our results are similar for both methods, we do believe that our measurements are correct. Even random single measurements on the original 3D tilt-series yielded comparable results (Lübke and co-workers, personal observation). Furthermore, our results are within ranges, although with high variability, also described by other groups (see discussion lines 436 - 449). We therefore hope that the reviewer will now accept our measurements.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Even it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming is also a prerequisite for release.

      It would help to call these pools as "putative" correlates of the morphological categories.

      We followed the suggestion by the reviewer and renamed our vesicle pools as putative RRP, putative RP and putative resting pools.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen at al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      Weaknesses:

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles is also more complex than it is suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      See above.

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles. Please, see the photos below, out of the 16 circled profiles (2nd picture, very similar to each other) only 3 belong to an astroglial cell (last picture, purple profiles-purple cell), 10 are spines/spine necks/small caliber dendrites of pyramidal cells, 3 are axonal profiles (last but one picture, blue profiles, marked with arrows on the right side). If you follow in your serial sections those elements which you think are glial processes and indeed they are attached to a confidently identifiable glial cell, I agree, it is a glial process. But identifying small, almost empty profiles without any specific staining, from one single EM section, as glial process is very uncertain. Please, check the database of the Allen Institute made from the V1 visual cortex of a mouse. It is a large series of EM sections where they reconstructed thousands of neurons, astroglial and microglial cells. It is possible to double click on the EM picture on a profile and it will show the cell to which that profile belongs. https://portal.brain-map.org/connectivity/ultrastructural-connectomics Pictures included here: https://elife-rp.msubmit.net/eliferp_files/2024/11/25/00132644/02/132644_2_attach_21_29456_convrt.pdf

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      As stated above, we carried out glutamine synthetase immunohistochemistry in L5 of the human TLN and came to the same results. However, we added a sentence on this in the chapter on astrocytic coverage in the Material and Methods section. Additionally, we modified this chapter according to the reviewer’s suggestion.

      Minor comments

      Introduction: Last sentence is not understandable (lines 101-103), please rephrase. (contribute to understand or contribute in understanding or contribute to the understanding of..., but definitely not contribute to understanding). The authors should check and review extensively for improvements to the use of English, or use a program such as Grammarly.

      Results: Grammar (line 107): L1 in the adult mammalian neocortex represents a relatively...

      Line 173: “Some SBs in both sublaminae were seen to establish either two or three SBs on the same spine, spines 173 of other origin or dendritic shafts." - Some SBs established two or three SBs? I would write Some SBs established two or three synapses on...

      Line 243: “The synaptic cleft size were slightly, but non-significantly different"

      Line 260: “DCVs play an important role in endo- and exocytosis, the build-up of PreAZs by releasing Piccolo and Bassoon (Schoch and Gundelfinger 2006; Murkherjee et al. 2010)," - please, correct this.

      We have done corrections as suggested by the reviewer.

      Line 374: No point at the end of the last phrase.

      Discussion:

      Lines 400-404: “The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." - What is comparable with the other layers, but different from animals? Please rephrase this sentence, it is not understandable. I already mentioned this sentence in my previous review, but nothing happened.

      Lines 435-437: “Remarkably, the total pool sizes in the human TLN were significantly larger by more than 6-fold (~550 SVs/AZ), and ~4.7-fold (~750 SVs/AZ;) than those in L4 and L5 (Yakoubi et al. 2019a, b; see also Rollenhagen et al. 2018) in rats." Please rethink what you wished to say and compare to the sentence meaning. I think you wanted to compare human TLN L1 pool size to L4 and L5 in the human TLN (Yakoubi 2019a and b) and to rat (Rollenhagen 2018). Instead, you compared all layers of the human TLN to L4 and L5 in rats (with partly wrong references). Please rephrase this. Lines 483-484: “Astrocytes serve as both a physical barrier to glutamate diffusion and as mediate neurotransmitter uptake via transporters".

      This sentence is grammatically incorrect, please rephrase.

      We corrected the sentences as suggested by the reviewer.

      Methods:

      In the text, there are only 4 patients (lines 603-604), but in the supplementary table there are 9 patients (5 new included for L4 astrocytic coverage). Please, correct it in the text.

      Lines 608-609: “neocortical access tissue samples were resected to control the seizures for histological inspection by neuropathologists." - What is the meaning of this? Please, rephrase.

      We thank the reviewer for the comment and included the 5 patients used for L4 to the Material and Methods section, as well as in the Results section.

      The reviewer is right, and we rephrased and corrected the sentence concerning the inspection by neuropathologists.

      Figures

      Figures 5B: The legend says “SB (sb) synapsing on a stubby spine (sp) with a prominent spine apparatus (framed area) and a thick dendritic segment (de) in L1b" - In my opinion this is not one synaptic bouton, but two. Clearly visible membranes separate them, close to the spine.

      Supplemental Table 2 (patient table). If there is no information about Hu_04 patient's epilepsy, please write N/A (=non available) instead of - (which means it does not exist).

      The reviewer is right, and we corrected the figure and the legend, as well as the table accordingly.

      Reviewer #2 (Recommendations for the authors):

      The authors addressed almost all of my concern, only this one remained:

      If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      There is a very detailed new study on calculating correction for TEM 2D 3D, Rothman et al 2023 PLOS One. That addresses most of these issues.

      We thank the reviewer for drawing our attention to the publication by Rothman et al. 2023, which is a very detailed and comprehensive study looking at accurately estimating distributions of 3D size and densities of particles from 2D measurements using – amongst others – ET and TEM images as well as synaptic vesicles for validating their method. However, we do not see how this would be relevant to the reported mean diameters and their corresponding variances. And even if we would have reported on vesicle size/diameter distributions (referred to as G(d) in Rothmann et al. 2023), the authors themselves state that “… the results from our ET and TEM image analysis highlight the difficulty in computing a complete G(d) of MFT vesicles due to their small size…

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      It would be helpful for the authors to highlight why their technique (large scale analysis of one emm type) can yield more information than a typical GWAS analysis of invasive vs. non-invasive strains. Are SNPs easier to identify using a large-scale core genome? Is it more likely evolutionarily to find mutations in non-coding regions as opposed to the core genome and accessory genes, and this is what this technique allows? Did the analysis yield unexpected genes or new genes that had not been previously identified in other GWAS analyses? These points may need to be made more apparent in the results and deserve some thought in the discussion section.

      We thank the reviewer for pointing out the importance of this study. By focusing on bacteria within a single emm type, false positives caused by confounding lineage effects can be minimized, which contributes to greater accuracy of the pan-GWAS. We have added relevant text describing the strong points of our pan-GWAS approach to the Results and Discussion sections, as shown following:

      “The present pan-GWAS of bacteria within a single emm type minimized lineage effects, thus reducing false positives.” (lines 204–205)

      The present study focused on emm89 S. pyogenes, known to cause increasing rates of invasive infections worldwide, and also assessed differences between emm89 strains causing invasive and non-invasive infections. By focusing on bacteria within closed phylogenies, false positives caused by confounding lineage effects were minimized, thus contributing to a higher level of accuracy of the pan-GWAS.” (lines 420–424)

      In addition, we would like to comment more regarding the reviewer’s question, "Is it more likely evolutionarily to find mutations in non-coding regions as opposed to the core genome and accessory genes, and this is what this technique allows?". Mutations are generally considered to be more frequent in non-coding than coding regions. However, the actual mutation frequencies in both types of regions were not assessed in this study. Nevertheless, exploring non-coding regions using the k-mer method is of considerable importance, as variations significantly associated with infectious phenotypes may contribute to alterations in gene expression and other regulatory mechanisms.

      The Alpha-fold data does not demonstrate why the mutations the authors identified could contribute to the invasive phenotype. It would be helpful to show an overlay of the predicted structures containing the different SNPs to demonstrate the potential structural differences that can occur due to the SNP. This would make the data more convincing that the SNP has a potential impact on the function of the protein. Similarly, the authors discuss modification of the hydrophobicity of the side chain in the ferrichrome transporter (lines 317-318) due to a SNP, but this is not immediately obvious in the figure (Fig. 5).

      As the reviewer suggested, we have substituted Figure 5E in the previous version with a figure illustrating the molecular surface within proximity of the mutation. We speculated that the mutation may induce a small indentation on the surface, and thus attenuate the stability of the hydrophobic bound between FhuB and FhuD by invasion of solvent into the indentation. Additionally, images showing the wild-type and mutated models have been separated for better visibility instead of as an overlay of the predicted models suggested by the reviewer. Relevant text in the Results section and legend of Figure 5E have been accordingly revised, as shown following:

      “The mutation was predicted to induce formation of a small indentation on the molecular surface, thus increasing the surface area accessible to the solvent, and is considered to potentially affect the stability of the hydrophobic bond between FhuB and FhuD, and thus ferrichrome transport (Figure 5E).” (lines 360–363)

      “The 73rd valine in FhuB, shown in magenta, was substituted with alanine. The molecular surface is illustrated with a wireframe and that of the predicted indentation is shown with an arrowhead.” (lines 1162–1164)

      Reviewer #1 (Recommendations for the author):

      The figure legend for Fig. 3C needs to be explained so that it is similarly laid out as in Fig. 2C. Fig. 2C should indicate that the magenta color represents the invasive phenotype.

      Based on this helpful suggestion, more detailed information about the magenta color representing the invasive phenotype has been added to the legends of Fig. 2C and 3C, with relevant text also included in the revised legends, as shown following:

      “Colored bars above indicate countries and phenotypes, and magenta bars represent invasive phenotypes. Using the Roary program, gene names starting with “Group_” were automatically assigned. Position indicates the location of each SNP/indel on the core gene alignment. The full results are shown in Table S6.” (lines 1116–1120)

      “Colored bars above indicate countries and phenotypes, and magenta bars represent invasive phenotypes. Using the Roary program, gene names starting with “Group_” were automatically assigned. The full results are shown in Table S8. (lines 1130–1133)

      The wording and organization of results in the k-mer section started to get confusing around lines 270-271. It begins to be a list of results and would be better served by some interpretation or explanation of the significance (why it is important to find such mutations). For example, for mutations you find in non-coding regions, do you expect them to have a detrimental effects on gene expression/regulation?

      As the reviewer kindly suggested, we have added interpretation or explanation of the significance of Comp_6 and Comp_24 to the Results section. We analyzed the function of the non-coding region of Comp_6 by employing web-based in silico tools, including MLDSPP and BacPP, though no promoter sequences could be identified. Next, using BLAST, a search for known promoter sequences of S. pyogenes M1 strain SF370 of the CDBProm database was attempted, because the web-based in silico promoter prediction tools are not suitable for S. pyogenes. However, neither identical nor homologous sequences were detected. Thus, the significance of this region remains unknown. In Comp_24, group_141 was also identified in the COGs-based pan-GWAS as a non-invasiveness related gene. Furthermore, group_141 showed high levels of correlation with group_139 and group_467, encoding transposase and uncharacterized protein, respectively, which suggests that the presence of an MGE is associated with a non-invasive phenotype.

      Relevant text has been added to the Materials and Methods (lines 653–657) and Results (lines 308–311 and 314–319) sections, as shown following:

      “Promoter sequences in intergenic regions were predicted using web-based tools, MLDSPP and BacPP[29,30]. Additionally, BLAST was employed to search the promoter sequences of S. pyogenes strain SF370 registered in the CDBProm database (https://aw.iimas.unam.mx/cdbprom/)[69]” (lines 653–657)

      “We speculated that this region is related to regulation of gene expression. However, no promoter sequences were identified by utilizing MLDSPP, BacPP, and BLAST, thus the significance of this region remains to be clarified[29,30].” (lines 308–311)

      “Furthermore, group_141 was also identified in the COGs-based pan-GWAS as a non-invasiveness-related gene along with group_139 and group_467, which encode transposase and uncharacterized protein, respectively (Table S8 and Figure S4). Taken together, the absence of an MGE containing group_141, and the presence of another MGE harboring group_142 and group_143 may result in an invasive phenotype.” (lines 314–319)

      Additionally, new references (#29, 30, and 69) concerning bacterial promoter prediction have been included in the revised version of the manuscript.

      Because there is no difference in intracellular free ferric ions in the fhuB mutant compared with the wild-type, the authors speculate that the upregulation of the fhuBCD operon can compensate for the loss of function of the fhuB gene, but there is insufficient data to support this claim.

      As the reviewer indicate, the data presented in the previous version were insufficient to support our speculation. Therefore, the following sentence has been deleted from the manuscript (previous version line 367):

      “Therefore, the upregulation of fhuBCD may compensate for the impaired function mediated by SNP T218C.”

      The authors mention that there was no direct association between invasiveness and acquisition of genes (lines 451-455), including antibiotic resistance genes from prophages and MGEs (lines 467-469). These data should be moved to the results section to focus the results on the correlation between invasiveness and mutation of existing DNA vs acquisition of new DNA.

      Accordingly, we have added relevant text to the Results section, as shown following:

      “On the other hand, the present pan-GWAS found no genes encoding known virulence factors significantly associated with invasiveness, thus further analysis of the relationships of detected distribution patterns with prophages and MGEs was performed.” (lines 264–267)

      Minor spelling error at line 210 ("waws" instead of "was").

      As the reviewer kindly pointed out, the spelling has been corrected. (line 233)

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Line 55: Does this rate apply to all types of infections?

      The authors appreciate this question from the reviewer. We checked what types of infections the mortality rate is applied to and confirmed that it only represents STSS. Therefore, relevant text has been revised, as shown following:

      However, even with proper treatment, the mortality rate of patients with STSS remains high, ranging from 23–81%[6]”. (lines 72–73)

      Line 58: Could you explain the protein encoded by the emm gene and the role of the hypervariable region in pathogenesis?

      As requested, relevant text regarding the pathogenic role of the hypervariable region of M protein has been added, as shown following:

      S. pyogenes has been classified into at least 240 emm types based on a hypervariable region sequence of the emm gene, which encodes the M protein. This hypervariable region of the M protein is responsible for type-specific antigenicity and binds with high affinity to C4b-binding protein, a major fluid phase inhibitor of the classical and lectin pathways of the complement system that confers resistance to opsonophagocytosis[8].” (lines 76–81)

      Line 161: Figure 1C does not show the strain with the different pattern.

      The authors apologize for the lack of clarity. In Fig. 1C, the strain is shown by a pale pink color bar used to indicate the related clade. For clarity, an arrowhead pointing to the strain from outside of the tree has been added along with the following text in the legend:

      “Arrowhead indicates strain belonging to the novel clade.” (lines 1102–1103)

      Line 239: It could be interesting to examine the genes in the region between the mobile elements found in the global cohort, as the result profile was very different from the Japanese group, which revealed more specific genes. Consider adding this to the results section.

      Based on the reviewer’s insightful suggestion, we attempted to find regions between the mobile genetic element-related genes. However, contigs generated from short reads were not adequate to identify such a genome structure. Therefore, calculations to analyze the pairwise correlation of the presence of significant COGs in the 666 strains to predict genes on prophages and MGEs were performed, and the results added to Figure S4. Eight clusters were detected as coexisting COG groups, seven of which comprised phage- or MGE-related genes. Furthermore, a cluster with antimicrobial-resistant genes was shown to be correlated with non-invasive infections. It is thus speculated that gain or loss of gene sets via phages and MGEs rather than acquisition of virulence genes may lead to changes in fitness to the environment and bacterial phenotypes. Relevant text has been added to the revised versions of the Results, Discussion, and Materials and Methods sections, as shown following:

      “On the other hand, the present pan-GWAS found no genes encoding known virulence factors significantly associated with invasiveness, thus further analysis of the relationships of detected distribution patterns with prophages and MGEs was performed. For calculating the pairwise correlation of the presence of significant COGs in the 666 strains, the COGs were clustered into eight coexisting groups, seven of which contained phage- and/or MGEs-related genes (Figure S4). The largest group comprised 65 genes including phage proteins, while the second largest with 42 genes was found to be associated with non-invasive infections, and included group_2689, group_1833, and ermA1, encoding TetR/AcrR family transcriptional regulator, multidrug efflux system permease protein, and rRNA adenine N-6-methyltransferase, respectively.” (lines 264–273)

      “On the other hand, a cluster comprising 49 non-invasiveness-associated genes including antibiotic-resistance genes was identified. Furthermore, among the genes showing a significant correlation with the infectious phenotype, approximately 90% (152 of 169) were associated with non-invasiveness. One possible explanation is that significantly related genes reflect the process of not only gain of factors but also loss of those affecting fitness cost.” (lines 517–522)

      “The correlation of the presence of significant COGs was calculated and visualized using the R program.” (lines 643–644)

      Line 548: What cutoff values were used in Fastp?

      The default cutoff value for Fastp (Q>15) was used, and relevant text has been added to the Materials and Methods section in the revised version, as shown following:

      “All collected sequences were subjected to quality checks using Fastp v.0.20.1, with a default cutoff value of Q>15[53].” (lines 600–601)

      Line 635: Were the transcriptome experiments performed in triplicate?

      We apologize for the confusion. The transcriptome experiment was performed only once with three samples for each condition. The notation “(n=3 for each condition)” has been added to the relevant text portion in the Materials and Methods section (line 696).

      Discussion section: I believe the authors should place more emphasis on the fact that FhuB is associated with non-invasiveness, to provide clearer context in the discussion.

      Based on this helpful suggestion, we have revised relevant text in the Discussion section, as shown following:

      “Transcriptomic analysis findings suggested that the Japan-specific fhuB mutation associated with non-severe invasive infections contributes to the growth rate of S. pyogenes in human blood by adapting to the environment.” (lines 457–459)

      Also, “V73A” has been removed from the relevant text in the Discussion section to provide a more clear and precise context, with the revised sentence shown following:

      “Two possible roles of the FhuB mutation in the pathogenesis of severe invasive infections are thus proposed.” (lines 470–471)

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to respond to just one remaining concern from Reviewer 1 and Reviewer 2 regarding a potential overfitting in Test Set 1, which involves combinations already present in the training set. DIPx’s (and TAIJI’s) performance in Test Set 1 is better than in Test Set 2, which involves combinations not present in the training set. Let’s consider two general points to highlight why the improved performance is not the result of overfitting. 

      (1) Suppose we are testing the e ect of one drug D; the training may involve, for example, selecting an optimal dose. A validated e ect of D in an independent test set is not an overfit, even though we are using the same drug in the training and the test set. Testing one drug is an extreme case, but the same idea holds for any number of drugs. What matters is the independence of the test set. 

      (2) A prediction model P1 will legitimately perform better than model P2, if P1 uses better or more informative features than P2. The features could be those used directly in the model, but they could also be other observable characteristics not directly used in the model, such as optimal subregions of the feature space. DPIx or TAIJI results indicate that the identity of previously trained combinations is one such informative feature. The set of previously trained combinations corresponds to a subregion of the feature space. DIPx’s prediction performance for known combinations would be expected to follow the results from Test Set 1; we cannot expect that if there is an overfitting issue. Finally, we note that Test Set 1 was established and used in the AstraZeneca Dream Challenge for rigorously testing the prediction of known combinations.

    1. Author response:

      We appreciate the constructive and thoughtful reviews provided by the reviewers and editorial team. We thank you for the opportunity to submit a provisional response and are grateful for the detailed and critical feedback that will strengthen our work. Below, we provide a summary of our planned revisions in response to the public reviews from Reviewer #1 and Reviewer #2.

      Reviewer #1 – Public Review Response Plan

      (1) Sample Overlap (MR Bias):

      We plan to replace several non-overlapping GWAS data sources to validate the association between aneurysms and atherosclerosis, thereby eliminating bias and Type I errors caused by sample overlap.

      (2) Multivariable MR (MVMR):<br /> We will attempt to incorporate known confounding factors (e.g., hypertension, smoking, diabetes) within the multivariable MR framework to verify the robustness of our results.

      (3) Clarifications and Presentation:

      - We will correct eTable citations.

      - Distinguish correctly between "incidence" and "prevalence".

      - Reorganize results to consistently present primary analyses first (IVW), followed by sensitivity results.

      - Expand the methods section to fully reflect all analyses.

      Reviewer #2 – Public Review Response Plan

      (1) Justification of MMP Selection:<br /> We will provide a detailed rationale for the inclusion of the 12 MMPs, based on prior literature and biological relevance.

      (2) Multiple Testing Clarification:<br /> We will clarify the Bonferroni correction strategy, explicitly accounting for all tests (e.g., 72 comparisons × multiple MR methods).

      (3) Instrument Selection Threshold:

      - We agree with the reviewer and will revise the SNP selection strategy, starting from p < 5×10⁻⁸ and only relaxing thresholds when fewer than 3 instruments are found.

      - Clarify the reasons why we do not use LD proxies.

      (4) Pleiotropy and Heterogeneity Tests:

      - We will add Egger's intercept results alongside MR-PRESSO.

      - Specify the R packages used (e.g., TwoSampleMR).

      - To prevent cluttered data presentation, we have included both heterogeneity and pleiotropy p-values in the supplementary tables.

      - Supplement forest plots showing outlier exclusion effects.

      (5) Clarifications in Figures and Tables:

      - Fix the duplicated “simple mode” entry in Figure 2.

      - Correct inconsistencies in p-values between figures and text.

      - Improve figure legends (e.g., color bar labels, panel identifiers).

      - Revise Table 4 title for clarity.

      - Remove the term "causal" where associations are nominal (e.g., p ~ 0.05).